[ previous ] [ next ] [ threads ]
 From:  krt <kkrrtt at gmail dot com>
 To:  Cheyenne Deal <deal dot cheyenne at gmail dot com>
 Cc:  m0n0wall <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Re: [m0n0wall] Lockups Using 1.3b2
 Date:  Fri, 23 Feb 2007 23:15:09 -0800
If you haven't already, you might want to put a monitor on the box.  You 
might be able to catch any potential last ditch error messages spewed 
out before the box locked.

Two guesses for the hardware side (I won't cover any software issues, 
but it could very well be something wonky in the beta code):

1) Bad/misbehaving NIC(s).  Swap one out at a time.  Note which one, a 
week is a long time to remember something like that ;-).  A bad NIC can 
take a box down.

2) Bad ram.  This one's easy - get memtest86 going on it.  If it's a 
physical ram issue, it could be that the second stick of memory isn't 
accessed frequently until about a weeks worth of memory usage has 
occurred (state table changes, packet handling, etc. can do this).

You can probably speed up the time to failure by throwing more traffic 
through the box.

A high connection usage protocol like HTTP is good for testing a 
stateful inspection box.  Due to it's nature, it's possible to generate 
a lot of TCP sessions in a short amount of time while moving a lot of 
data.  Thanks to the existence of multiple scriptable HTTP access 
methods, it's relatively trivial to automate a large connection 
count/bit transfer procedure between two machines.

1) Grab two machines.  One will be a client off one of the firewalls 
NICs, the other will be a server off the other firewall NIC.

2) The client machine is easy - if it doesn't already have an OS, just 
use a liveCD of some sort like Knoppix, Damn Small Linux or FreeSBIE. 
It has to have a scriptable web client method, like wget, lynx, links, etc.

3) On the web server side, have at least a dozen 2mb or so files. 
Nothing small, but you needn't make these terribly large either.

4) Write a little script that loops through a get routine for each of 
the files.  The files should just be discarded on the client side, as 
they'll be written over and over and over again.  It's best to just dump 
them to /dev/null, as you don't really need to thrash a physical disk 
just to toss away data.  Basically, the client should be grabbing file 
1, then 2, then 3, etc. and looping back to grab file 1, 2, etc. all 
over again.

Cheyenne Deal wrote:
> I have been using the 1.3b2 live cd and it has worked flawlessy (Good Job)
> but every week I have to restart it. Its running on a P1, 192mb ram, 2 
> Intel
> FE 10/100 SP NIC's. Do you have any Ideas on whats causing the lockups?