[ previous ] [ next ] [ threads ]
 From:  Dirk-Willem van Gulik <dirkx at webweaving dot org>
 To:  Fred Wright <fw at well dot com>
 Cc:  m0n0wall at lists dot m0n0 dot ch
 Subject:  Re: [m0n0wall] m0n0wall + net4501 keeps rebooting
 Date:  Wed, 31 Dec 2003 01:54:24 -0800 (PST)
On Tue, 30 Dec 2003, Fred Wright wrote:

> > - removing watchdogd
> Since the current watchdog support obviously needs work, that's probably
> not a bad idea, although it doesn't solve the underlying problem of the
> system getting too bogged down to run processes.

Folks, there is _NOTHING_ in that watchdog; it enables a 31 second
hardware-countdown on the CPU by passing a single ioctl() to the kernel
which directly writes the value into a CPU register. Then sleeps 15
seconds (using nanosleep()) and does it again.

If userland does not get enough CPU for 30+ seconds to do just even the
bare basics - more than just the watchdog breaks. Fixing this symptom (of
an overloaded) system by removing the watchdog is only going to unearth
another problem; e.g. syslog, DNS timing probles, DHCP running amoc, huge
listen() queues, mbuf starvation, etc.

Fixing the problem could be

->	not throw unrealistic loads at a 486 soekris; again,
	those things are deployed by the many hundreds in
	commercial settings - and I've not seen this issue
	in real live when they handle multiple 100Mbits on
	several T1's. If you can afford more - perhaps an
	upgrade is in order :-)

->	make the kernel/hw go faster

->	let some traffic damping/packed dropping kick in
	if the machine gets overwelmed.


but shooting the messenger is not going to fix the fundamental issue, just
gets you by until the next symptom is getting too painful. And that one
may be a whole lot harder to debug than an abvious reboot.