On Mon, 27 Sep 2004, Manuel Kasper wrote:
> On 27.09.2004 12:08 +0200, Vincent Fleuranceau wrote:
> > Of course, in my current setup (1.1b2) I've simply removed the
> > <shellcmd> entries. Should I consider using them again in place of
> > the WebGUI keepalive option?
> As I reported, the webGUI auto-establishment option is broken at the
> moment. However, the tunnel should be (re-)established as soon as a
> packet is sent. You can still try the <shellcmd> stuff - in that
> case, the only difference would be obey/claim.
Yes, it looks like you used Justin's version, which I knew wouldn't work
(at least not reliably) just by inspection. :-)
> > Has the 'obey' -> 'claim' change affected my setup (like Chris
> > Buechler), even if I use only m0n0wall on both sides of the tunnel?
> Who knows? Try changing it and see what happens...
That aspect seems to be understood now. Assuming the current setting is
kept (which still makes sense for the lifetime), people should be warned
about the effect on mismatched PFS groups.
On Mon, 27 Sep 2004, Vincent Fleuranceau wrote:
> I've just realized that the keepalive code in vpn.inc generates only
> *one* ping command when the config is saved and racoon restarted. The
> "_bg" suffix made me think it was a sort of loop in the background...
Exactly. And it's not guaranteed to happen at an effective time. Until
racoon is up enough to have registered with PF_KEY, pings fall on deaf
ears. Although doing it "after" starting recoon would appear to be
adequate, in reality racoon (like just about any active daemon) forks
itself off soon after starting, and thus creates a race between its
startup and the continued execution of the script. PHP's slothfulness may
not be sufficient to insure that the ping comes late enough. :-)
> So, I've set up the <shellcmd> stuff and it works again as expected.
> To give you an idea, on my net4501 (with PPPoE on WAN) the tunnel is
> fully functional after approx. 80 seconds from the moment I reboot the
> remote m0n0wall.
> As mentioned in my previous post, I use the following commands (modified
> version of Fred Wright's <shellcmd> kludge):
> ping every 5 seconds x 24 times
> (-> wake up phase, during 2 minutes, to be sure...)
> ping every 60 seconds forever
> (-> keep alive)
Yes, you changed all the settings to make it more aggressive, which I
don't think is really desirable. First of all, I was trying to avoid
needing configuration options for the various delays, and instead was
trying to come up with a "one size fits all" set of delays. Secondly, the
whole "faster doesn't hurt" philosophy isn't really true when you consider
large numbers of units.
I use a 10-second interval between the initial pings, which is not only a
compromise between time quantization and network traffic, but also is
intended to be longer than the typical time to perform the IKE exchange.
Reducing it to 5 seconds means that racoon will often see a second request
before it's finished dealing with the first. Although it *should* handle
that correctly, I see to reason to ask for trouble just to shave a few
seconds off the time to establish the tunnel after booting.
The one-minute time for the high-rate pings was chosen to be long enough
to cover startup delays with a working network. Unless there's some case
where it takes longer than that for IKE to be usable on a good net
connection, I see no reason to increase it. The combination of the
doubled high-rate period and the halved high-rate interval quadruples the
amount of traffic generated at boot time, which could get to be pretty
significant if a bunch of m0n0walls were all rebooting at the same time
(e.g. after an enterprise-wide power failure).
The slower ongoing pings are intended to bring up the tunnel later if the
net connection isn't working at boot time. They also recover from cases
where SAs expire during a network outage. Aside from the general traffic
issue, there's also the problem that every unsuccessful attempt will
generate log entries. You *really* don't want racoon log entries showing
up once a minute. 10 minutes seems like a reasonable compromise between
log bloatage and the delay in reestablishing the tunnel after a protracted
network outage (which would almost certainly be much longer than 10
minutes in any case where this matters).
In your version you also increased the initial sleep to tune it to your
particular configuration, but that's just the sort of thing I was trying
to avoid. The only reason I included the initial sleep is because sending
the first ping immediately is so unlikely to work that it's not worth
doing. Adding a sleep with the same duration as the ping interval
essentially just makes the ping behave as if the delay were "top-tested"
rather than "bottom-tested".
The one currently unsolved problem with the ongoing ping is locating it to
kill it when shutting the tunnel down. But it's clear that any ping-based
approach needs to be indefinitely ongoing to be fully effective.
The other problem with the ping approach is that it's permissible to
create a tunnel that doesn't contain any of the m0n0wall's IPs, but it's
not possible for m0n0wall to send a ping with such a source address. The
remote end isn't a problem, since getting a response from the ping isn't
required. In fact, pinging a nonexistent remote IP reduces the network
traffic by avoiding the useless response. But that doesn't work for the
source address of the ping, at least not with the normal ping client.