[ previous ] [ next ] [ threads ]
 From:  Fred Wright <fw at well dot com>
 To:  m0n0wall at lists dot m0n0 dot ch
 Subject:  Re: [m0n0wall] m0n0wall behind a satellite connection
 Date:  Tue, 8 Feb 2005 14:43:01 -0800 (PST)
On Tue, 8 Feb 2005, Didier Lebrun wrote:

> The high latency does create some specific problems with TCP, since the 
> [bandwidth * latency] product is too big for the standard TCP window size 
> and packets loss can become dramatic in this kind of context. But the IETF 
> has developped a set of TCP extensions (RFC 1323) in order to overcome 
> these limits and have TCP perform better on LFNs (Long Fat Networks). 
> Recent versions of FreeBSD support the RFC 1323 extensions and use them 
> automatically when necessary, but the main problem is between the clients 
> and the remote servers anyway, since the TCP transaction occur between 
> ends, the gateway just letting the packets through, with the exception of 
> DNS and NTP queries.

Correct.  Directly supporting RFC1323 on a router isn't terribly
important, although it *is* important that the router not screw it up,
e.g. by failing to handle window scaling correctly in stateful filtering.

> When RFC 1323 extensions are supported by the system, TCP adjusts itself by 
> calculating the TCP window size as soon as the first ACK comes back... but 

That's not exactly correct.  The default socket buffer size is set
independently from any knowledge of the peer's capability, but in the
absence of window scaling the usable window is clamped at 65535.  That may
not even manage to avoid allocating the buffer based on the uselessly
larger size.

> before that, it uses the system default values, which are often badly 
> optimized for sat links. So, it's better to tweak them on each client in 

They're often not even adequate for broadband to distant points.  The
theoretical minimum RTT for halfway around the planet is about 133ms.

> order to have it perform better. The main principles are:
>          - enable "Window Scaling" (usually enabled by default)
>          - enable "Time Stamping"

Only the above two relate to RFC1323.

>          - enable "Selective ACKs" (SACK)

Although the original "long fat pipe" package included SACK, the original
SACK proposal was broken and hence explicitly *not* included in RFC1323.
A reworked SACK mechanism was later published as RFC2018.  In general,
SACK support is less "mature", but fortunately it's the least important of
the three, especially if packet loss is low.

>          - enable "Path MTU Discovery"

This has absolutely nothing to do with RFC1323, although bulk-transfer
efficiency on *all* links is better when the MTU can be chosen correctly
rather than using an arbitrary "conservative" value.  But then you have to
worry about broken firewalls not forwarding the needed ICMP packets.

>          - set the TCP receive window size to a higher value [max bandwidth 
> in bytes * max latency in sec]

It's actually worse than that.  Although that's sufficient to avoid
window-limited rates in the absence of packet loss, whenever a packet is
dropped it takes *two* RTTs (plus the time to trigger fast retransmit) to
get it through, and hence any window size smaller than twice the
delay-bandwidth product (and then some) will diminish the effectiveness of
fast retransmit.

Note that this amount of buffer space is needed *per connection* even
though the only real requirement is for that amount of *total* oustanding
data.  This can eat up RAM pretty quickly.

Also note that similar issues apply to the *send* buffer size, although
upstream speed are usually slower.

>          - disable "Black Hole Detection" (MS Windows only)

This has nothing to do with RFC1323.  It's actually a workaround for PMTUd
failures due to blocked ICMP.  The only reason I could see for its having
anything to do with this is if the timeout for calling the path
"broken" is too short for a satlink.

>          - set "Max duplicate ACKs" = 2 (3 on Win98 only)

Again that has nothing to do with RFC1323, but instead represents the fast
retransmit threshold.  RFC2581 recommends 3.  Lower values reduce the
amount of send buffer needed to avoid window stalls after dropped packets,
but increase the risk of unnecessary retransmissions.  Also note that this
only affects *send* performance.

> On FreeBSD, you can adjust a few thing too:

But I wouldn't recommend doing this to m0n0wall, since it's rarely a TCP
endpoint and often can't afford the RAM.

> If you are using FreeBSD's traffic shaping capabilities, you must adjust to 
> size of the queues too, in order to avoid packets drops when the queue is 
> full. You can set each download queue to the TCP receive windows size, and 
> each upload queue to the TCP sendspace. The same for the main pipes 
> (96Kbytes and 24Kbytes in our case).

But the queues aren't what's filling up.  The extra data that one has to
accomodate is literally "up in the air" (or at least the vacuum).  If the
traffic shaper is just dealing with packets, it shouldn't care.  If it's
trying to be clever enough to watch TCP SEQ and ACK numbers but not clever
enough to take large RTTs into account, then it's broken.  In no case
should it ever be necessary to buffer significant data *in the router*.
In fact, excessive buffering in routers simply increases the overall
delay-bandwidth product (by increasing latency) and thus requires *more*
buffering at the endpoints.

Theoretically the same argument would apply to the socket buffers, but the
problem is that the receiver can't offer window unless it can commit to
receiving that amount of data regardless of application behavior.  The
send-side buffer is needed because it can't be certain that the data has
been delivered until it gets the end-to-end acknowledgment.

> Another kind of problem that can arise, is MTU/MSS miscalculations, since 
> the TCP header is 4 bytes longer than usual when using the RFC 1323 
> extensions. They fill the 6th optional TCP header line, thus producing 
> headers of 44 bytes (20 IP + 24 TCP) instead of 40 usually (20 IP + 20 
> TCP). It can create problems when using VPNs or any kind of encapsulation.

I don't know where you get that particular number.  There's an option for
window scaling, but it appears only in the initial SYN segments.  Ditto
for the option *enabling* SACK.  The timestamps option adds *12* bytes
(including padding) to every segment.  SACKs add a variable amount, but
for most application protocols (simplex ofr half-duplex) tend to appear in
otherwise empty packets.

					Fred Wright