Luc Naus said:
> Hi all,
> I have installed Fred's workaround and it seems to be working fine.
> Any ideas what could cause the original problem?
I'm afraid not. I looked at tcpdumps made when the pptp connection was
"in trouble", but I couldn't find specific problems (other than: this
should not be happening). So I decided to change tactics and see whether
changing some configuration paramters would make any difference.
The reasons for picking these two options were:
1) These are options that are set for incoming (vpn) pptp connections.
Apparently they are necessary to fix packets loss problems when
making an pptp vpn connection from Windows XP. I simply thought: if it
works in that situation it might also work in this situation (although
I have to admit that the problems are different).
2) They looked like they could be related to problems I had seen.
> Fred: what exactly does your hotfix do, what changes in the trafic due
> to your changes?
This is what the manual says about the two options I added:
This option causes mpd to adjust outgoing TCP data so that the requested
segment size is not greater than the amount allowed by the interface MTU.
This is necessary in many setups to avoid problems caused by routers that drop
ICMP Datagram Too Big messages. Without these messages, the originating
machine sends data, it passes the rogue router then hits a machine that has an
MTU that is not big enough for the data. Because the IP Don't Fragment option
is set, this machine sends an ICMP Datagram Too Big message back to the
originator and drops the packet. The rogue router drops the ICMP message and
the originator never gets to discover that it must reduce the fragment size or
drop the IP Don't Fragment option from its outgoing data.
I picked this one because in one case I saw a lot of fragmentation occurring
from modem -> m0n0wall. However, when I think it over, that doesn't make
any sense because this option is about outgoing (m0n0wall->modem) data.
Fred Wright said this about it:
> I doubt that the first one would have anything to do with *this*
> symptom, since if the MSS is a problem then you'd mainly have connections
> that don't work at all.
So my guess is that this option is not the "cure". It probably only doesn't
Enable delayed ACK's. This can improve throughput on reliable links.
Default is on.
I'm afraid I picked this one for the wrong reasons as well :-(
What is saw in some tcpdumps was TCP outgoing acks being lost (or not being
sent anyway). So I thought: ack problem, option related to acks -> try it.
However, this option affects GRE level acks, so it's on a different level.
When this option is enabled (the default), GRE level acks will be piggybacked
to data packets going to the modem. The GRE module will wait for data packets
to become available for some time and will eventually send the acks on its
own if that doesn't happen within some time. I also had a peek at the source
code and noticed that when data _is_ available, but the send window is full,
the data packet is discarded but no ack is sent either (it will be sent
eventually when the ack timeout occurs).
When this option is disabled (what I did), GRE level acks are sent immediately,
without being piggybacked to data packets.
I don't have a clue why this is better, but at least it's simpler. I could
simply be a bug in the delayed-ack code that has to deal with all kinds of
interesting race conditions that simply are not present when acks are sent
immediately. Or it could be a bad interaction between the GRE module and
whatever KPN is using (it seems that this problem occurs mostly in Holland).
I'll try just disabling delayed-ack to see whether that's enough to fix
the problem. If that's the case then maybe we can convince Manuel to