[ previous ] [ next ] [ threads ]
 From:  "Aaron Cherman" <aaronc at morad dot ab dot ca>
 To:  <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Re: [m0n0wall] Version 1.22 freeze
 Date:  Mon, 12 Jun 2006 08:10:47 -0600
> This issues has been brought up before. Monowall/freebsd seems to have a 
> bug that may cause a sporadic total OS freeze/lockup. I'm bringing this up 
> again because i have an installation running monowall version 1.22, which 
> freezes solid every once in a while.
> Initially i used a SOEKRIS net 4501 hardware platform, but the system 
> froze approx. every 2-3 hours. I replaced the haw with a SOEKRIS net 4801 
> (replaced the PSU too) and have just seen a lockup after 7 days of uptime. 
> There's nothing in the setup out of the ordinary.

First, I'd like to say I'm damn happy that I'm not the only one still seeing 
this.  And my write-up here is long.  Sorry.

I have done some testing over the last few months with regards to this 
problem.  I currently have 4 m0n0wall boxes in service on various parts of 
one very large LAN.  The one that cause me problems is the one that has the 
highest traffic load.  The only special thing about this unit is it has 12 
VLANs configured on OPT1.  It serves approx. 200 clients to the Internet 
(including our WISP traffic), DHCP on one VLAN serves 30 addresses and 
around 30 static.  Another VLAN used for network management traffic has 
approx. 100 nodes behind it but these are mostly local traffic that does not 
pass through the m0n0wall.  Through-traffic rarely exceeds 4 Mbps one way.

I have tried a number of different hardware platforms in this position.  My 
original hardware was a Lex Systems CV860A, 800 MHz with Realtek NICs.  At 
first I suspected the NICs.  Under the guidance of several mailing list 
members I purchased the same hardware with a 1 GHz processor and Intel NICs. 
Same issue.  I then purchased higher quality CF cards (128 MB) but the 
problem still occurred.  After that I went to an old generic tower PC - P3 
with 256 Mb of RAM, using an old hard drive.  Of course it happened again. 
My next try was the CD and floppy configuration.  You guessed it, same 
thing.  Also, this whole time my others units, all CV860As with Realtek 
NICs, have been running great with 90+ days of uptime.

This particular unit would lock up every 6-12 days.  No indication of why, 
no messages on the console, nothing in the syslogs.  Every lock-up would 
present a completely unresponsive box - could not ping any interface, 
console would not respond, even unplugging the Ethernet cable the lights for 
the port would stay on.  Weird.  A hard reboot was the only option.  Then 
the thought occurred to me that I remember it lasting longer when a config 
change was made.  So I started making a config change every couple of days - 
small change like increasing or decreasing the number of log entries shown. 
This got me up to 28 days.  Wow!  I haven't seen that in a long time.

I also seem to remember this fun little PITA showing up after I upgraded to 
1.2.  I also remember reading someone's post about this issue.  He said he 
encountered that same thing when upgrading to 1.2.  He ended up rebuilding 
his configuration from scratch on a box running 1.2 already.  So I decided 
to try something else.  I burned a CD with 1.11 on it and rebooted my 
machine with it in using my existing config file.  I have now been running 
for 11 days with config changes at all.  I know, 11 days, wahoo.  It's not 
much in the real world, but it is for me, right now.  I am in the process of 
building my config from scratch with 1.22 on one of the CV860A, 1 GHz, 512 
MB RAM, 128 MB CF card, Intel NICs.  I figure if I get up to 20+ days uptime 
with the 1.11 CD/floppy, I will swap the new 1.22 box in and see where it 
gets to.

This has been my experience so far.  Thanks for reading.  And I still love