On Tue, 30 Dec 2003, Chad R. Larson wrote:
> >It makes perfect sense to me now: at 100.0% CPU load (mostly interrupts),
> >that poor watchdogd process starves and doesn't get to tickle the watchdog
> >again. After about 30 seconds, the watchdog timer fires, the CPU is reset
> >- bingo.
> You could prove that by having the watchdog process syslog a message before
It cannot - the reboot is entirely in hardware. Even the kernel does not
see what is coming.
But if it is this reproduceable a simple kill of the watchdogd and a
retest or 3 will help confirm it. Having said that I've never seen it in a
real live or realistic lab situation; even on the 486'es.
> It =does= make sense. That would be why polling helped--it stopped the
> interrupt storm.
And made userland healthy again - which you want; as other demons have
accept() queues, even syslog could consume mbuf's at a rapid clip.
> But I'd prefer the watchdog be left in. It guarantees that userland
> processes are getting some time to run, which is its whole point.
And some level of health.