[ previous ] [ next ] [ threads ]
 From:  Dirk-Willem van Gulik <dirkx at webweaving dot org>
 To:  "Chad R. Larson" <clarson at eldocomp dot com>
 Cc:  Manuel Kasper <mk at neon1 dot net>, Justin Albstmeijer <justin at VLAMea dot nl>, "m0n0wall at lists dot m0n0 dot ch" <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Re: [m0n0wall] m0n0wall + net4501 keeps rebooting
 Date:  Tue, 30 Dec 2003 13:34:24 -0800 (PST)
On Tue, 30 Dec 2003, Chad R. Larson wrote:

> >It makes perfect sense to me now: at 100.0% CPU load (mostly interrupts),
> >that poor watchdogd process starves and doesn't get to tickle the watchdog
> >again. After about 30 seconds, the watchdog timer fires, the CPU is reset
> >- bingo.
> You could prove that by having the watchdog process syslog a message before
> rebooting.

It cannot - the reboot is entirely in hardware. Even the kernel does not
see what is coming.

But if it is this reproduceable a simple kill of the watchdogd and a
retest or 3 will help confirm it. Having said that I've never seen it in a
real live or realistic lab situation; even on the 486'es.

> It =does= make sense.  That would be why polling helped--it stopped the
> interrupt storm.

And made userland healthy again - which you want; as other demons have
accept() queues, even syslog could consume mbuf's at a rapid clip.

> But I'd prefer the watchdog be left in.  It guarantees that userland
> processes are getting some time to run, which is its whole point.

And some level of health.