|
||||||||
On Wed, 8 Sep 2004, Manuel Kasper wrote: > On 08.09.2004 01:54 +0200, Frederick Page wrote: > > > Same here: also have a Soekris net4801, m0n0wall 1.1, > > DSL-connection. Browser (tried IE and Firebird on XP, Firebird on > > OpenBSD and lynx on Linux) just hangs and keeps waiting forever. > > I can reproduce this with my m0n0wall at home (FreeBSD > client/PPPoE/ADSL) too. The problem doesn't seem to be that MSS > clamping is not working, but rather that NetBSD sends packets larger > than [MSS + 40 bytes], which are then fragmented and the fragments > blocked by ipfilter for some reason. This doesn't appear to be a NetBSD problem at all. See below. > Turning off timestamps in the FreeBSD client (sysctl > net.inet.tcp.rfc1323=0) makes it work. Though not a very flavorful workaround. :-) > This is probably related: > <http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=20461> It *could* be related, although the case there is different and the explanation is poor. Looking at that case: 1) The client (platform and MTU unknown) sends an MSS of 1445, which (with the set of options present) would be correct for an MTU of 1497. If the "less than 1488" is really correct, then that would indicate a bug in the *client* (non-NetBSD) TCP. 2) The server (NetBSD with unknown MTU) is sending an MSS of 1460, which, given the timestamps, would correspond to an MTU of 1512. That's most likely incorrect, but it would have no practical effect unless the sender's MTU was larger than 1500. 3) The only way #2 could have a bearing on the problem here is if the same apparently wrong calculation were used for the send MSS, which is possible. The way MSS is supposed to work is that each TCP determines both a send MTU and a receive MTU (which may be different), and then from that determines a send MSS and a receive MSS by subtracting the expected TCP/IP overhead (including any options that will be present in data packets) from the respective MTUs. The local receive MSS is sent to the peer during connection setup. The lesser of the peer's receive MSS and the local send MSS is used as the actual MSS for sending. The main difference between send and receive MTUs is that the latter is not supposed to take Path MTU into account, due to the possibility of asymmetric MTUs (PMTU is only determinable in the outgoing direction). Any "MSS clamping" sticking its nose into the middle of this had better be prepared to understand the implications of options like timestamps. And had better be prepared to track any future additions to TCP that affect packet overhead. One of many reasons why MSS clamping is a poor substitute for properly working PMTU Discovery. While Wayne's case has no packet trace, there are a number of anomalies that can be surmised: 1) The fact that packets were being fragmented at all indicates a lack of PMTU Discovery on NetBSD's part. This could be due to lack of the feature (unlikely), the feature's being disabled (possible), or "black-hole recovery" working around broken PMTUd. It's worth noting that Iljitsch's case had no DFs on the traffic from the server, either, so maybe NetBSD.org has a brain-dead firewall blocking ICMP errors. Or a brain-dead firewall stripping off DF bits. 2) The fact that the packets came out too large indicates that no MSS clamping took place on m0n0wall, at least not with the correct MSS value. 3) When the oversized packet gets fragmented (presumably by the ISP's PPP server), it gets fragmented in a *very* strange way: ------------------------------------------------------------------------- On 7 Sep 2004, Wayne Marshall wrote: > Return packets blocked from www.netbsd.org (204.152.190.12): > > Sep 7 08:52:36 bl0ck ipmon[68]: \ > 08:52:35.742487 ng0 @0:16 b 204.152.190.12 -> 209.180.174.155 \ > PR tcp len 20 (1476) frag 1456@24 IN ------------------------------------------------------------------------- Note the "1456@24", which means a 1456-byte partial IP payload beginning at offset 24. Apparently the fragmenter made the *second* fragment the full-sized one, and the *first* one minimal. Usually, a simple-minded fragmenter makes all but the *last* fragment as large as possible, and a smarter one tries to equalize the fragment sizes (to minimize possible further fragmentation). Although this other method is legal, it makes no sense. The ISP isn't by any chance the German T-DSL, is it? :-) 4) The fragment is getting blocked by IPFilter in spite of the "allow fragments". I have a suspicion that anomaly #3 is contributing to this, since the TCP header is 32 bytes in this case, and only 24 are (apparently) being included in the first fragment. While this is perfectly legal (in fact, in a really pathological case a TCP header could be spread over as many as 8 fragments), I wouldn't be at all surprised if IPFilter just threw up its hands in disgust over it. Note that the complete *basic* TCP header (the first 20 bytes) is included in the initial fragment, and that contains everything IPFilter actually pays attention to, but it wouldn't be the first time it got overzealous about blocking something. 5) If the problem is IPFilter's dislike of the fragment organization, I would have expected the initial fragment to be rejected as well. But perhaps there's another rule that's letting it through even if the stateful filter doesn't like it. Note that *none* of this has anything to do with packet-size calculations performed by NetBSD, even though there's other evidence that those may *also* be wrong. On 8 Sep 2004, Wayne Marshall wrote: > > The remaining puzzle (only to me, I suppose) is why a similar > firewall with user-ppp, pppoe, and OpenBSD+PF does not block on > the netbsd server. The FAQ for ipfilter, section X, number 17 > notes: > > ...ipfilter doesn't support RFC1323 window size extensions. > > Is ipfilter somehow missing a capability that plays a role here? Actually, that statement is out of date, but in any case it refers to a different aspect of RFC1323 (window scaling) that's not relevant here. The "long fat pipe" extensions as originally proposed included three largely indpendent enhancements: window scaling, timestamps, and selective acknowledgments. But the original RFC1072 SACK scheme was sufficiently broken that it was officially withdrawn, and RFC1323 "went to press" without it. Later, a reworked version appeared as RFC2018. Fred Wright |