|
||||||||
On Tue, 8 Feb 2005, Didier Lebrun wrote: > The high latency does create some specific problems with TCP, since the > [bandwidth * latency] product is too big for the standard TCP window size > and packets loss can become dramatic in this kind of context. But the IETF > has developped a set of TCP extensions (RFC 1323) in order to overcome > these limits and have TCP perform better on LFNs (Long Fat Networks). > Recent versions of FreeBSD support the RFC 1323 extensions and use them > automatically when necessary, but the main problem is between the clients > and the remote servers anyway, since the TCP transaction occur between > ends, the gateway just letting the packets through, with the exception of > DNS and NTP queries. Correct. Directly supporting RFC1323 on a router isn't terribly important, although it *is* important that the router not screw it up, e.g. by failing to handle window scaling correctly in stateful filtering. > When RFC 1323 extensions are supported by the system, TCP adjusts itself by > calculating the TCP window size as soon as the first ACK comes back... but That's not exactly correct. The default socket buffer size is set independently from any knowledge of the peer's capability, but in the absence of window scaling the usable window is clamped at 65535. That may not even manage to avoid allocating the buffer based on the uselessly larger size. > before that, it uses the system default values, which are often badly > optimized for sat links. So, it's better to tweak them on each client in They're often not even adequate for broadband to distant points. The theoretical minimum RTT for halfway around the planet is about 133ms. > order to have it perform better. The main principles are: > - enable "Window Scaling" (usually enabled by default) > - enable "Time Stamping" Only the above two relate to RFC1323. > - enable "Selective ACKs" (SACK) Although the original "long fat pipe" package included SACK, the original SACK proposal was broken and hence explicitly *not* included in RFC1323. A reworked SACK mechanism was later published as RFC2018. In general, SACK support is less "mature", but fortunately it's the least important of the three, especially if packet loss is low. > - enable "Path MTU Discovery" This has absolutely nothing to do with RFC1323, although bulk-transfer efficiency on *all* links is better when the MTU can be chosen correctly rather than using an arbitrary "conservative" value. But then you have to worry about broken firewalls not forwarding the needed ICMP packets. > - set the TCP receive window size to a higher value [max bandwidth > in bytes * max latency in sec] It's actually worse than that. Although that's sufficient to avoid window-limited rates in the absence of packet loss, whenever a packet is dropped it takes *two* RTTs (plus the time to trigger fast retransmit) to get it through, and hence any window size smaller than twice the delay-bandwidth product (and then some) will diminish the effectiveness of fast retransmit. Note that this amount of buffer space is needed *per connection* even though the only real requirement is for that amount of *total* oustanding data. This can eat up RAM pretty quickly. Also note that similar issues apply to the *send* buffer size, although upstream speed are usually slower. > - disable "Black Hole Detection" (MS Windows only) This has nothing to do with RFC1323. It's actually a workaround for PMTUd failures due to blocked ICMP. The only reason I could see for its having anything to do with this is if the timeout for calling the path "broken" is too short for a satlink. > - set "Max duplicate ACKs" = 2 (3 on Win98 only) Again that has nothing to do with RFC1323, but instead represents the fast retransmit threshold. RFC2581 recommends 3. Lower values reduce the amount of send buffer needed to avoid window stalls after dropped packets, but increase the risk of unnecessary retransmissions. Also note that this only affects *send* performance. > On FreeBSD, you can adjust a few thing too: But I wouldn't recommend doing this to m0n0wall, since it's rarely a TCP endpoint and often can't afford the RAM. > If you are using FreeBSD's traffic shaping capabilities, you must adjust to > size of the queues too, in order to avoid packets drops when the queue is > full. You can set each download queue to the TCP receive windows size, and > each upload queue to the TCP sendspace. The same for the main pipes > (96Kbytes and 24Kbytes in our case). But the queues aren't what's filling up. The extra data that one has to accomodate is literally "up in the air" (or at least the vacuum). If the traffic shaper is just dealing with packets, it shouldn't care. If it's trying to be clever enough to watch TCP SEQ and ACK numbers but not clever enough to take large RTTs into account, then it's broken. In no case should it ever be necessary to buffer significant data *in the router*. In fact, excessive buffering in routers simply increases the overall delay-bandwidth product (by increasing latency) and thus requires *more* buffering at the endpoints. Theoretically the same argument would apply to the socket buffers, but the problem is that the receiver can't offer window unless it can commit to receiving that amount of data regardless of application behavior. The send-side buffer is needed because it can't be certain that the data has been delivered until it gets the end-to-end acknowledgment. > Another kind of problem that can arise, is MTU/MSS miscalculations, since > the TCP header is 4 bytes longer than usual when using the RFC 1323 > extensions. They fill the 6th optional TCP header line, thus producing > headers of 44 bytes (20 IP + 24 TCP) instead of 40 usually (20 IP + 20 > TCP). It can create problems when using VPNs or any kind of encapsulation. I don't know where you get that particular number. There's an option for window scaling, but it appears only in the initial SYN segments. Ditto for the option *enabling* SACK. The timestamps option adds *12* bytes (including padding) to every segment. SACKs add a variable amount, but for most application protocols (simplex ofr half-duplex) tend to appear in otherwise empty packets. Fred Wright |