[ previous ] [ next ] [ threads ]
 From:  Fred Wright <fw at well dot com>
 To:  m0n0wall at lists dot m0n0 dot ch
 Subject:  Re: [m0n0wall] accessing netbsd.org from behind m0n0wall
 Date:  Thu, 9 Sep 2004 22:23:46 -0700 (PDT)
On Wed, 8 Sep 2004, Manuel Kasper wrote:
> On 08.09.2004 01:54 +0200, Frederick Page wrote:
> > Same here: also have a Soekris net4801, m0n0wall 1.1,
> > DSL-connection. Browser (tried IE and Firebird on XP, Firebird on
> > OpenBSD and lynx on Linux) just hangs and keeps waiting forever.
> I can reproduce this with my m0n0wall at home (FreeBSD
> client/PPPoE/ADSL) too. The problem doesn't seem to be that MSS
> clamping is not working, but rather that NetBSD sends packets larger
> than [MSS + 40 bytes], which are then fragmented and the fragments
> blocked by ipfilter for some reason.

This doesn't appear to be a NetBSD problem at all.  See below.

> Turning off timestamps in the FreeBSD client (sysctl
> net.inet.tcp.rfc1323=0) makes it work.

Though not a very flavorful workaround. :-)

> This is probably related:
> <http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=20461>

It *could* be related, although the case there is different and the
explanation is poor.  Looking at that case:

1) The client (platform and MTU unknown) sends an MSS of 1445, which
(with the set of options present) would be correct for an MTU of 1497.  If
the "less than 1488" is really correct, then that would indicate a bug in
the *client* (non-NetBSD) TCP.

2) The server (NetBSD with unknown MTU) is sending an MSS of 1460, which,
given the timestamps, would correspond to an MTU of 1512.  That's most
likely incorrect, but it would have no practical effect unless the
sender's MTU was larger than 1500.

3) The only way #2 could have a bearing on the problem here is if the same
apparently wrong calculation were used for the send MSS, which is

The way MSS is supposed to work is that each TCP determines both a send
MTU and a receive MTU (which may be different), and then from that
determines a send MSS and a receive MSS by subtracting the expected TCP/IP
overhead (including any options that will be present in data packets) from
the respective MTUs.  The local receive MSS is sent to the peer during
connection setup.  The lesser of the peer's receive MSS and the local send
MSS is used as the actual MSS for sending.  The main difference between
send and receive MTUs is that the latter is not supposed to take Path MTU
into account, due to the possibility of asymmetric MTUs (PMTU is only
determinable in the outgoing direction).

Any "MSS clamping" sticking its nose into the middle of this had better be
prepared to understand the implications of options like timestamps.  And
had better be prepared to track any future additions to TCP that affect
packet overhead.  One of many reasons why MSS clamping is a poor
substitute for properly working PMTU Discovery.

While Wayne's case has no packet trace, there are a number of anomalies
that can be surmised:

1) The fact that packets were being fragmented at all indicates a lack of
PMTU Discovery on NetBSD's part.  This could be due to lack of the feature
(unlikely), the feature's being disabled (possible), or "black-hole
recovery" working around broken PMTUd.  It's worth noting that Iljitsch's
case had no DFs on the traffic from the server, either, so maybe
NetBSD.org has a brain-dead firewall blocking ICMP errors.  Or a
brain-dead firewall stripping off DF bits.

2) The fact that the packets came out too large indicates that no MSS
clamping took place on m0n0wall, at least not with the correct MSS value.

3) When the oversized packet gets fragmented (presumably by the ISP's PPP
server), it gets fragmented in a *very* strange way:
On 7 Sep 2004, Wayne Marshall wrote:
> Return packets blocked from www.netbsd.org (
> Sep  7 08:52:36 bl0ck ipmon[68]: \
>   08:52:35.742487 ng0 @0:16 b -> \
>   PR tcp len 20 (1476) frag 1456@24 IN
Note the "1456@24", which means a 1456-byte partial IP payload beginning
at offset 24.  Apparently the fragmenter made the *second* fragment the
full-sized one, and the *first* one minimal.  Usually, a simple-minded
fragmenter makes all but the *last* fragment as large as possible, and a
smarter one tries to equalize the fragment sizes (to minimize possible
further fragmentation).  Although this other method is legal, it makes no
sense.  The ISP isn't by any chance the German T-DSL, is it? :-)

4) The fragment is getting blocked by IPFilter in spite of the "allow
fragments".  I have a suspicion that anomaly #3 is contributing to this,
since the TCP header is 32 bytes in this case, and only 24 are
(apparently) being included in the first fragment.  While this is
perfectly legal (in fact, in a really pathological case a TCP header could
be spread over as many as 8 fragments), I wouldn't be at all surprised if
IPFilter just threw up its hands in disgust over it.  Note that the
complete *basic* TCP header (the first 20 bytes) is included in the
initial fragment, and that contains everything IPFilter actually pays
attention to, but it wouldn't be the first time it got overzealous about
blocking something.

5) If the problem is IPFilter's dislike of the fragment organization, I
would have expected the initial fragment to be rejected as well.  But
perhaps there's another rule that's letting it through even if the
stateful filter doesn't like it.

Note that *none* of this has anything to do with packet-size calculations
performed by NetBSD, even though there's other evidence that those may
*also* be wrong.

On 8 Sep 2004, Wayne Marshall wrote:
> The remaining puzzle (only to me, I suppose) is why a similar
> firewall with user-ppp, pppoe, and OpenBSD+PF does not block on
> the netbsd server.  The FAQ for ipfilter, section X, number 17
> notes:
>   ...ipfilter doesn't support RFC1323 window size extensions.
> Is ipfilter somehow missing a capability that plays a role here?

Actually, that statement is out of date, but in any case it refers to a
different aspect of RFC1323 (window scaling) that's not relevant here.

The "long fat pipe" extensions as originally proposed included three
largely indpendent enhancements:  window scaling, timestamps, and
selective acknowledgments.  But the original RFC1072 SACK scheme was
sufficiently broken that it was officially withdrawn, and RFC1323 "went to
press" without it.  Later, a reworked version appeared as RFC2018.

					Fred Wright