[ previous ] [ next ] [ threads ]
 From:  Kerem Erciyes <k underscore erciyes at zegnaermenegildo dot it>
 To:  m0n0wall at lists dot m0n0 dot ch
 Subject:  how ipsec fooled us all
 Date:  Thu, 10 Feb 2005 14:41:01 +0200
Hi all,

I had posted a few days ago complaining about stabilit yissues
concerning m0n0wall and my troubles with IP sec.

m0n0wall was giving packet lost errors constantly and the resulting in
a crash, which everyone and me suspected a hw layer problem.

Well I disabled IPSEC on m0n0s and now the lockups have stopped
happing as well as "/kernel: vr0 packet lost errors. IPSEC wasnt
finishing Phase 2 anyways, now it is all stable, though no IPSEC, will
find a way around somehow.


Kerem Erciyes (k underscore erciyes at zegnaermenegildo dot it)
IT Sorumlusu
ISMACO Amsterdam BV (+90 216 394 00 00)
Ermenegildo Zegna Butik (+90 212 291 10 24)

This message is OpenPGP Signed and content and 
identity of the sender can be verified with a
pulic PGP key of the sender. Public PGP key
can be obtained upon request.

Thursday, February 10, 2005, 2:26:58 PM, you wrote:

DL> At 14:43 08/02/2005 -0800, Fred Wright wrote:

>>On Tue, 8 Feb 2005, Didier Lebrun wrote:
>> > The high latency does create some specific problems with TCP, since the
>> > [bandwidth * latency] product is too big for the standard TCP window size
>> > and packets loss can become dramatic in this kind of context. But the IETF
>> > has developped a set of TCP extensions (RFC 1323) in order to overcome
>> > these limits and have TCP perform better on LFNs (Long Fat Networks).
>> > Recent versions of FreeBSD support the RFC 1323 extensions and use them
>> > automatically when necessary, but the main problem is between the clients
>> > and the remote servers anyway, since the TCP transaction occur between
>> > ends, the gateway just letting the packets through, with the exception of
>> > DNS and NTP queries.
>>Correct.  Directly supporting RFC1323 on a router isn't terribly
>>important, although it *is* important that the router not screw it up,
>>e.g. by failing to handle window scaling correctly in stateful filtering.
>> > When RFC 1323 extensions are supported by the system, TCP adjusts
>> itself by
>> > calculating the TCP window size as soon as the first ACK comes back... but
>>That's not exactly correct.  The default socket buffer size is set
>>independently from any knowledge of the peer's capability, but in the
>>absence of window scaling the usable window is clamped at 65535.  That may
>>not even manage to avoid allocating the buffer based on the uselessly
>>larger size.

DL> I'm not sure to undestand what you mean here. Doesn't TCP adjust the socket
DL> buffer and the TCP RWIN once it has received some ACKs, allowing it to
DL> calculate the proper TCP RWIN ?

>> > before that, it uses the system default values, which are often badly
>> > optimized for sat links. So, it's better to tweak them on each client in
>>They're often not even adequate for broadband to distant points.  The
>>theoretical minimum RTT for halfway around the planet is about 133ms.
>> > order to have it perform better. The main principles are:
>> >          - enable "Window Scaling" (usually enabled by default)
>> >          - enable "Time Stamping"
>>Only the above two relate to RFC1323.

DL> It's true, but I didn't mean to say all options where part of 1323 ! I just
DL> meant to say that they can play some role in case of sat links.

>> >          - enable "Selective ACKs" (SACK)
>>Although the original "long fat pipe" package included SACK, the original
>>SACK proposal was broken and hence explicitly *not* included in RFC1323.
>>A reworked SACK mechanism was later published as RFC2018.  In general,
>>SACK support is less "mature", but fortunately it's the least important of
>>the three, especially if packet loss is low.
>> >          - enable "Path MTU Discovery"
>>This has absolutely nothing to do with RFC1323, although bulk-transfer
>>efficiency on *all* links is better when the MTU can be chosen correctly
>>rather than using an arbitrary "conservative" value.  But then you have to
>>worry about broken firewalls not forwarding the needed ICMP packets.
>> >          - set the TCP receive window size to a higher value [max
>> bandwidth
>> > in bytes * max latency in sec]
>>It's actually worse than that.  Although that's sufficient to avoid
>>window-limited rates in the absence of packet loss, whenever a packet is
>>dropped it takes *two* RTTs (plus the time to trigger fast retransmit) to
>>get it through, and hence any window size smaller than twice the
>>delay-bandwidth product (and then some) will diminish the effectiveness of
>>fast retransmit.

DL> You might be true. I noticed some problems in case of packets loss,
DL> especially with WinXP clients, but couldn't figure them out. I'll have to
DL> get into fast retransmit documentation to fully understand this point.

>>Note that this amount of buffer space is needed *per connection* even
>>though the only real requirement is for that amount of *total* oustanding
>>data.  This can eat up RAM pretty quickly.
>>Also note that similar issues apply to the *send* buffer size, although
>>upstream speed are usually slower.
>> >          - disable "Black Hole Detection" (MS Windows only)
>>This has nothing to do with RFC1323.  It's actually a workaround for PMTUd
>>failures due to blocked ICMP.  The only reason I could see for its having
>>anything to do with this is if the timeout for calling the path
>>"broken" is too short for a satlink.
>> >          - set "Max duplicate ACKs" = 2 (3 on Win98 only)
>>Again that has nothing to do with RFC1323, but instead represents the fast
>>retransmit threshold.  RFC2581 recommends 3.  Lower values reduce the
>>amount of send buffer needed to avoid window stalls after dropped packets,
>>but increase the risk of unnecessary retransmissions.  Also note that this
>>only affects *send* performance.

DL> You're true in the principle, but I discussed this point with a sat
DL> technician, who recommanded me to reduce the retransmit threshold, since 2
DL> RTTs is already quite big in case of sat link, and the risk of  having
DL> packets still arriving later than that is pretty low. I don't remember the
DL> reason he gave me for the Win98 exception ?

>> > On FreeBSD, you can adjust a few thing too:
>>But I wouldn't recommend doing this to m0n0wall, since it's rarely a TCP
>>endpoint and often can't afford the RAM.
>> > If you are using FreeBSD's traffic shaping capabilities, you must
>> adjust to
>> > size of the queues too, in order to avoid packets drops when the queue is
>> > full. You can set each download queue to the TCP receive windows size, and
>> > each upload queue to the TCP sendspace. The same for the main pipes
>> > (96Kbytes and 24Kbytes in our case).
>>But the queues aren't what's filling up.  The extra data that one has to
>>accomodate is literally "up in the air" (or at least the vacuum). If the
>>traffic shaper is just dealing with packets, it shouldn't care.  If it's
>>trying to be clever enough to watch TCP SEQ and ACK numbers but not clever
>>enough to take large RTTs into account, then it's broken.  In no case
>>should it ever be necessary to buffer significant data *in the router*.
>>In fact, excessive buffering in routers simply increases the overall
>>delay-bandwidth product (by increasing latency) and thus requires *more*
>>buffering at the endpoints.
>>Theoretically the same argument would apply to the socket buffers, but the
>>problem is that the receiver can't offer window unless it can commit to
>>receiving that amount of data regardless of application behavior. The
>>send-side buffer is needed because it can't be certain that the data has
>>been delivered until it gets the end-to-end acknowledgment.

DL> My argument might not be relevant for m0n0wall's traffic shaping ? I've not
DL> studied it enough to tell. On our gateway, we use DUMMYNET + IPFW2 + NATD
DL> (static firewall) with a principle of pipe sharing, whatever the bandwidth
DL> is at a given moment, without setting any absolute value for each pipe or
DL> queue, since I observed that setting an absolute value was increasing the
DL> latency by approximately +200 to +250 ms. Each client obtains a fair share
DL> of the whole pipes (upload and download), depending of how many clients are
DL> using the link simultaneously, with a weight depending on ports numbers. So
DL> we can have sometimes one client using the full link at it's best capacity,
DL> and a whole TCP window can get stuck in any queue before TCP stops
DL> transmitting more. That's why we need to set a whole TCP window size for
DL> each queue in order to avoid packet drops. I did some experimental
DL> observations by setting various values and looking at packets drops, and I
DL> found that some drops did occur when the queue size was under 75% of the
DL> TCP window, and disappeared above this value. I supposed the difference
DL> between 75% and 100% was because [MAX ... * MAX ...] is overestimated.

>> > Another kind of problem that can arise, is MTU/MSS miscalculations, since
>> > the TCP header is 4 bytes longer than usual when using the RFC 1323
>> > extensions. They fill the 6th optional TCP header line, thus producing
>> > headers of 44 bytes (20 IP + 24 TCP) instead of 40 usually (20 IP + 20
>> > TCP). It can create problems when using VPNs or any kind of encapsulation.
>>I don't know where you get that particular number.  There's an option for
>>window scaling, but it appears only in the initial SYN segments.  Ditto
>>for the option *enabling* SACK.  The timestamps option adds *12* bytes
>>(including padding) to every segment.  SACKs add a variable amount, but
>>for most application protocols (simplex ofr half-duplex) tend to appear in
>>otherwise empty packets.

DL> You're probably true. I did observed 44 bytes headers, but I don't remember
DL> checking whether all headers had this size.

>>                                         Fred Wright

DL> Thanks for your precisions :-)

DL> --
DL> Didier Lebrun
DL> Le bourg - 81140 - Vaour (France)
DL> tél: (AM et soirée)
DL> mailto:dl at vaour dot net (MIME, ISO latin 1)
DL> http://didier.quartier-rural.org/

DL> ---------------------------------------------------------------------
DL> To unsubscribe, e-mail: m0n0wall dash unsubscribe at lists dot m0n0 dot ch
DL> For additional commands, e-mail: m0n0wall dash help at lists dot m0n0 dot ch