At 14:43 08/02/2005 -0800, Fred Wright wrote:
>On Tue, 8 Feb 2005, Didier Lebrun wrote:
> > The high latency does create some specific problems with TCP, since the
> > [bandwidth * latency] product is too big for the standard TCP window size
> > and packets loss can become dramatic in this kind of context. But the IETF
> > has developped a set of TCP extensions (RFC 1323) in order to overcome
> > these limits and have TCP perform better on LFNs (Long Fat Networks).
> > Recent versions of FreeBSD support the RFC 1323 extensions and use them
> > automatically when necessary, but the main problem is between the clients
> > and the remote servers anyway, since the TCP transaction occur between
> > ends, the gateway just letting the packets through, with the exception of
> > DNS and NTP queries.
>Correct. Directly supporting RFC1323 on a router isn't terribly
>important, although it *is* important that the router not screw it up,
>e.g. by failing to handle window scaling correctly in stateful filtering.
> > When RFC 1323 extensions are supported by the system, TCP adjusts
> itself by
> > calculating the TCP window size as soon as the first ACK comes back... but
>That's not exactly correct. The default socket buffer size is set
>independently from any knowledge of the peer's capability, but in the
>absence of window scaling the usable window is clamped at 65535. That may
>not even manage to avoid allocating the buffer based on the uselessly
I'm not sure to undestand what you mean here. Doesn't TCP adjust the socket
buffer and the TCP RWIN once it has received some ACKs, allowing it to
calculate the proper TCP RWIN ?
> > before that, it uses the system default values, which are often badly
> > optimized for sat links. So, it's better to tweak them on each client in
>They're often not even adequate for broadband to distant points. The
>theoretical minimum RTT for halfway around the planet is about 133ms.
> > order to have it perform better. The main principles are:
> > - enable "Window Scaling" (usually enabled by default)
> > - enable "Time Stamping"
>Only the above two relate to RFC1323.
It's true, but I didn't mean to say all options where part of 1323 ! I just
meant to say that they can play some role in case of sat links.
> > - enable "Selective ACKs" (SACK)
>Although the original "long fat pipe" package included SACK, the original
>SACK proposal was broken and hence explicitly *not* included in RFC1323.
>A reworked SACK mechanism was later published as RFC2018. In general,
>SACK support is less "mature", but fortunately it's the least important of
>the three, especially if packet loss is low.
> > - enable "Path MTU Discovery"
>This has absolutely nothing to do with RFC1323, although bulk-transfer
>efficiency on *all* links is better when the MTU can be chosen correctly
>rather than using an arbitrary "conservative" value. But then you have to
>worry about broken firewalls not forwarding the needed ICMP packets.
> > - set the TCP receive window size to a higher value [max
> > in bytes * max latency in sec]
>It's actually worse than that. Although that's sufficient to avoid
>window-limited rates in the absence of packet loss, whenever a packet is
>dropped it takes *two* RTTs (plus the time to trigger fast retransmit) to
>get it through, and hence any window size smaller than twice the
>delay-bandwidth product (and then some) will diminish the effectiveness of
You might be true. I noticed some problems in case of packets loss,
especially with WinXP clients, but couldn't figure them out. I'll have to
get into fast retransmit documentation to fully understand this point.
>Note that this amount of buffer space is needed *per connection* even
>though the only real requirement is for that amount of *total* oustanding
>data. This can eat up RAM pretty quickly.
>Also note that similar issues apply to the *send* buffer size, although
>upstream speed are usually slower.
> > - disable "Black Hole Detection" (MS Windows only)
>This has nothing to do with RFC1323. It's actually a workaround for PMTUd
>failures due to blocked ICMP. The only reason I could see for its having
>anything to do with this is if the timeout for calling the path
>"broken" is too short for a satlink.
> > - set "Max duplicate ACKs" = 2 (3 on Win98 only)
>Again that has nothing to do with RFC1323, but instead represents the fast
>retransmit threshold. RFC2581 recommends 3. Lower values reduce the
>amount of send buffer needed to avoid window stalls after dropped packets,
>but increase the risk of unnecessary retransmissions. Also note that this
>only affects *send* performance.
You're true in the principle, but I discussed this point with a sat
technician, who recommanded me to reduce the retransmit threshold, since 2
RTTs is already quite big in case of sat link, and the risk of having
packets still arriving later than that is pretty low. I don't remember the
reason he gave me for the Win98 exception ?
> > On FreeBSD, you can adjust a few thing too:
>But I wouldn't recommend doing this to m0n0wall, since it's rarely a TCP
>endpoint and often can't afford the RAM.
> > If you are using FreeBSD's traffic shaping capabilities, you must
> adjust to
> > size of the queues too, in order to avoid packets drops when the queue is
> > full. You can set each download queue to the TCP receive windows size, and
> > each upload queue to the TCP sendspace. The same for the main pipes
> > (96Kbytes and 24Kbytes in our case).
>But the queues aren't what's filling up. The extra data that one has to
>accomodate is literally "up in the air" (or at least the vacuum). If the
>traffic shaper is just dealing with packets, it shouldn't care. If it's
>trying to be clever enough to watch TCP SEQ and ACK numbers but not clever
>enough to take large RTTs into account, then it's broken. In no case
>should it ever be necessary to buffer significant data *in the router*.
>In fact, excessive buffering in routers simply increases the overall
>delay-bandwidth product (by increasing latency) and thus requires *more*
>buffering at the endpoints.
>Theoretically the same argument would apply to the socket buffers, but the
>problem is that the receiver can't offer window unless it can commit to
>receiving that amount of data regardless of application behavior. The
>send-side buffer is needed because it can't be certain that the data has
>been delivered until it gets the end-to-end acknowledgment.
My argument might not be relevant for m0n0wall's traffic shaping ? I've not
studied it enough to tell. On our gateway, we use DUMMYNET + IPFW2 + NATD
(static firewall) with a principle of pipe sharing, whatever the bandwidth
is at a given moment, without setting any absolute value for each pipe or
queue, since I observed that setting an absolute value was increasing the
latency by approximately +200 to +250 ms. Each client obtains a fair share
of the whole pipes (upload and download), depending of how many clients are
using the link simultaneously, with a weight depending on ports numbers. So
we can have sometimes one client using the full link at it's best capacity,
and a whole TCP window can get stuck in any queue before TCP stops
transmitting more. That's why we need to set a whole TCP window size for
each queue in order to avoid packet drops. I did some experimental
observations by setting various values and looking at packets drops, and I
found that some drops did occur when the queue size was under 75% of the
TCP window, and disappeared above this value. I supposed the difference
between 75% and 100% was because [MAX ... * MAX ...] is overestimated.
> > Another kind of problem that can arise, is MTU/MSS miscalculations, since
> > the TCP header is 4 bytes longer than usual when using the RFC 1323
> > extensions. They fill the 6th optional TCP header line, thus producing
> > headers of 44 bytes (20 IP + 24 TCP) instead of 40 usually (20 IP + 20
> > TCP). It can create problems when using VPNs or any kind of encapsulation.
>I don't know where you get that particular number. There's an option for
>window scaling, but it appears only in the initial SYN segments. Ditto
>for the option *enabling* SACK. The timestamps option adds *12* bytes
>(including padding) to every segment. SACKs add a variable amount, but
>for most application protocols (simplex ofr half-duplex) tend to appear in
>otherwise empty packets.
You're probably true. I did observed 44 bytes headers, but I don't remember
checking whether all headers had this size.
> Fred Wright
Thanks for your precisions :-)
Le bourg - 81140 - Vaour (France)
mailto:dl at vaour dot net (MIME, ISO latin 1)