|
||||||||||
On Tue, 30 Dec 2003, Chad R. Larson wrote: > >It makes perfect sense to me now: at 100.0% CPU load (mostly interrupts), > >that poor watchdogd process starves and doesn't get to tickle the watchdog > >again. After about 30 seconds, the watchdog timer fires, the CPU is reset > >- bingo. > > You could prove that by having the watchdog process syslog a message before > rebooting. It cannot - the reboot is entirely in hardware. Even the kernel does not see what is coming. But if it is this reproduceable a simple kill of the watchdogd and a retest or 3 will help confirm it. Having said that I've never seen it in a real live or realistic lab situation; even on the 486'es. > It =does= make sense. That would be why polling helped--it stopped the > interrupt storm. And made userland healthy again - which you want; as other demons have accept() queues, even syslog could consume mbuf's at a rapid clip. > But I'd prefer the watchdog be left in. It guarantees that userland > processes are getting some time to run, which is its whole point. And some level of health. Dw |