[ previous ] [ next ] [ threads ]
 From:  Fred Wright <fw at well dot com>
 To:  m0n0wall dash dev at lists dot m0n0 dot ch
 Subject:  Re: [m0n0wall-dev] Re: IPsec auto-establishment broken
 Date:  Mon, 27 Sep 2004 19:18:07 -0700 (PDT)
On Mon, 27 Sep 2004, Justin Ellison wrote:
> On Mon, 2004-09-27 at 17:17, Fred Wright wrote:
> > Exactly.  And it's not guaranteed to happen at an effective time.  Until
> > racoon is up enough to have registered with PF_KEY, pings fall on deaf
> > ears.  Although doing it "after" starting recoon would appear to be
> > adequate, in reality racoon (like just about any active daemon) forks
> > itself off soon after starting, and thus creates a race between its
> > startup and the continued execution of the script.  PHP's slothfulness may
> > not be sufficient to insure that the ping comes late enough. :-)
> Few things here.  First, I knew that the race condition could become an
> issue, but it worked every time for me on my net4501, which should be
> slow enough hardware-wise to bank that racoon's forking would finish
> before the ping executed.  The one condition I didn't test was loading
> the racoon.conf file up with a bunch of different tunnels, which I would
> assume would slow racoon down a bit.

It's not clear that slow hardware is the worst case.  If racoon's
initialization involves more I/O-related delays than the script, then the
script might be more likely to "win" on faster hardware.

Although I didn't try your pinger here, when I was experimenting with
things in <shellcmd> it appeared that they weren't getting executed until
a few seconds after racoon's startup, and I knew that a single <shellcmd>
ping didn't work, so I didn't expect the ping in the script to work any

In any case, depending on that sort of race isn't desirable.

> Also, there is some terminology confusion here.  Auto-establish means
> just that, when racoon is started, the tunnel is automatically
> established.  Auto-establish and keepalive are two completely different

Actually, the usual use of the term "keepalive" is for something yet
different:  *detecting* cases where connectivity is *lost* in order to
flag something as down.  Though this sometimes has the side effect of
defeating idle timeouts.

> things.  To be honest, I created it to fix the problem of a m0n0wall
> getting rebooted and having the SA issue that your kernel patch seems to
> fix.  Am I correct in that assumption?  Before, if you have network A
> with an IPSec tunnel to network B, and network B's m0n0wall was power
> cycled, the only way to re-establish the tunnel was by sending a packet
> from B to A.  Packets sent from A to B would fail to bring up the new
> tunnel.  If your kernel patch fixes this, then my patch more than likely
> is not needed.

No, the kernel fixes are for *different* problems related to rebooting.  
In the default "prefer older" mode, even if new SAs are properly
established, they don't take effect until the old ones have expired, thus
potentially causing an outage of up to the SA lifetime.

The first part of the kernel fix was to make "prefer newer" work
correctly.  As it was, it changed the comparision sense of the timestamps
but not the order of checking the two lists of SAs that are separated by
state, so it would still prefer "mature" SAs to "dying" SAs.  That had
already been fixed in the "standard" IPsec, but not the FAST_IPSEC
version.  I just copied the fix from the former to the latter.

The second part was to fix the "blink" problem (a momentary loss of
connectivity) at SA switchover.  If a given SA is installed on the sending
side before the receiving side, then it leads to lost packets in the
interim.  One would think that the duration of the "blink" would only be a
fraction of a second, but due to the rate-limited IKE sending, it actually
winds up being about 5 seconds in practice.  In "prefer older" mode, this
doesn't matter, since the new SAs don't get used for a while after being
installed, but "prefer newer" introduces this trouble.  My fix for this
was to implement a "holdoff" time, where new SAs have to be in existence
for a minimum amount of time before being preferred to older SAs.  The
current 30-second value was chosen to be long enough to cover up the time
skew in installing new SAs, while not so long that it appreciably delays
the usability of a tunnel after a reboot.

None of the preference stuff has any effect unless there's a way to get
new SAs established after a reboot in the first place, and the
non-rebooted end has no way to know that its SAs have become "widowed" (I
used to call it "orphaned" but "widowed" seems more appropriate for an SA
that's lost its mate, not its parent :-)).  Thus, it behooves the rebooted
endpoint to have some mechanism to reinitiate the tunnel.

> > The other problem with the ping approach is that it's permissible to
> > create a tunnel that doesn't contain any of the m0n0wall's IPs, but it's
> > not possible for m0n0wall to send a ping with such a source address. 
> Manuel and I were talking about this issue earlier today.  I originally
> thought that netcat could send packets without binding to an address,
> but my test proved otherwise.  Manuel's response:
> "Mmmm, I guess there are some "hacker" tools around that can send
> arbitrary IP packets (ipsend from ipfilter?). I can't help feeling
> that this is an ugly solution and really something that should be
> dealt with in racoon though."

I don't known if raw packets are permitted to spoof a foreign source IP,
though I'm pretty sure sending with BPF could.  But there's also the issue
of whether such packets get injected at the right point to get redirected
to the tunnel by the IPsec code.

Another possibility would be to spoof ACQUIRE messages through PF_KEY (and
thus not generate any "idle chatter at all), although I suspect that might
have a variety of other problems.

> My gut feeling is that we're trying to kill a fly with a sheet of
> plywood.  We might fix the immediate problem, but who knows what else
> might get mucked up in the process.  
> I think this is where I have to show my newbness with racoon/FreeBSD:
> 1)  Surely we can't be the only people fighting this - are the racoon
> folks working on the problem?

Don't know.  I wouldn't expect implementing auto-establish to be *too*
hard, but I haven't had time to dig into the code yet.  FreeSWAN seems to
have that feature, since that's what SnapGear uses.  The SnapGear allows
you to specify a retry limit for establishing a tunnel, but "infinity" is
one of the options.

> 2)  Is racoon the most mature keying daemon for FreeBSD?  There's an
> ISAKMPD in ports - is that any better?

Don't know, but it might not even be for IKE.  IKE essentially took parts
of ISAKMP and parts of Oakley, and threw in a new "abbreviated" P2 mode.  
If you think IKE is complicated, note that it's the *simplified* KM
protocol. :-)

					Fred Wright