[ previous ] [ next ] [ threads ]
 
 From:  "Stefan Hegnauer" <stefan dot hegnauer at gmx dot ch>
 To:  <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Kernel panic under load, 1.3b14 on Alix 2c3
 Date:  Sat, 30 Aug 2008 18:04:48 +0200
After several unintentional reboots over the last couple days I startet to
investigate. It seems that for some reason the kernel panics under moderate
load, which in turn leads to a reboot.

How to replicate:
Easiest with some instances of nmap running against some (4-6) targets on
the WAN / internet side of m0n0, these instances can run from the same PC
though (Windows XP over a WLAN AP in my case):
	nmap -v -A -PN -p1-65535 -T insane <your.target.org>
Using less agressive timings crash m0n0 as well, it just takes longer until
m0n0 reboots.

An out-of-the box installation of m0n0wall 1.3b14 embedded on my Alix 2c3
(only rule change: enable any to any from LAN to WAN; and using PPPoE in my
case) will crash within minutes.

On the serial port of my Alix 2c3 I catched the following output, the first
two lines are probably the most interesting ones:

	ipf_nattable_max reduced to 86282
	panic: kmem_malloc(4096): kmem_map too small: 80642048 total
allocated
	Uptime: 13m23s
	Cannot dump. No dump device defined.
	Automatic reboot in 15 seconds - press a key on the console to abort
	Rebooting...
	PC Engines ALIX.2 v0.99
	640 KB Base Memory
	261120 KB Extended Memory

	01F0 Master 848A SanDisk SDCFB-64                        
	Phys C/H/S 490/8/32 Log C/H/S 490/8/32

	BIOS drive C: is disk0
	BIOS 640kB/261120kB available memory

	FreeBSD/i386 bootstrap loader, Revision 1.1
	(root at mb63 dot neon1 dot net, Sat Aug 23 21:46:54 CEST 2008)
....


I also uploaded vmstat to m0n0 (see
http://m0n0.ch/wall/list/showmsg.php?id=344/93) and had it run on another
trial. This is what it spit out: 

$ /tmp/vmstat -c 1000 -w 1
 procs      memory      page                   disk   faults      cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr ad0   in   sy  cs us sy id
 0 2 0   21220 201252   32   0   0   0  27   0   0 2271   94 2298  1  2 97
 0 2 0   21220 201244    4   0   0   0   0   0   0 2139  117 2070  0  0 100
 ...
 0 2 0   21220 201068    0   0   0   0   0   0   0 3905  111 4417  0 10 90
--> starting first instance of nmap
 0 2 0   21220 200760    0   0   0   0   0   0   0 3904  111 4361  0 20 80
 0 2 0   21220 200456    0   0   0   0   0   0   0 3900  111 4402  0 19 81
 0 2 0   21220 200124    0   0   0   0   0   0   0 4080  184 4680  0 19 81
 0 2 0   21220 199760    0   0   0   0   0   0   0 4241  111 4861  0 14 86
 0 2 0   21220 199392    0   0   0   0   0   0   0 4296  111 4971  0 14 86
 0 2 0   21220 199016    0   0   0   0   0   0   0 4272  111 4932  0 12 88
 ...
 0 2 0   21220 136512    1   0   0   0   0   0   0 5097 1017 6058  0 33 67
 0 2 0   21220 136512    0   0   0   0   0   0   0 5117  138 5978  1 30 69
 0 2 0   21220 136512    0   0   0   0   0   0   0 5103  116 6016  0 31 69
--> kernel panic

As I see it, the CPU was not overloaded (with still some 70% of idling CPU),
and although there was around 130MB of free memory the kernel run out of
space.

Does anyone have a clue what to try next? In my understanding dropping of
connections would be perfectly ok under heavy load, panicking of the kernel
is certainly not. Am I too picky?

-Stefan