[ previous ] [ next ] [ threads ]
 
 From:  Ryan Grove <ryan at wonko dot com>
 To:  m0n0wall at lists dot m0n0 dot ch
 Subject:  dnsmasq eats CPU, kills network access
 Date:  Tue, 30 Sep 2003 22:45:25 -0700 (Pacific Daylight Time)
For the last few weeks I've been battling an intermittent loss of
connectivity approximately every two days or so while using m0n0wall. It
happened again today, and I had the presence of mind this time to do
some digging to try and see what was going on.

What I found was that dnsmasq was pegging the CPU. While this was
happening, none of my LAN machines could access the Internet. Here's the
output from top showing the runaway process:

===

last pid:   509;  load averages:  1.04,  1.02,  1.00  up 2+08:23:03    22:30:04
16 processes:  2 running, 13 sleeping, 1 zombie

Mem: 4992K Active, 3568K Inact, 4948K Wired, 12K Cache, 4592K Buf, 37M Free
Swap:


  PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  266 nobody    83  10   948K   724K RUN     24.7H 95.07% 95.07% dnsmasq
   69 root      10   0  1448K  1212K nanslp   0:34  0.00%  0.00% ipmon
   59 root       2   0  1432K  1084K select   0:33  0.00%  0.00% dhclient
   77 root       2   0   980K   708K select   0:10  0.00%  0.00% syslogd
  271 root       2  10  1780K  1456K select   0:05  0.00%  0.00% dhcpd
   33 root      10   0   880K   536K nanslp   0:02  0.00%  0.00% watchdogd
  103 root      10   0   944K   660K nanslp   0:01  0.00%  0.00% ez-ipupdate
   96 root       2   0  2220K  1084K accept   0:00  0.00%  0.00% mini_httpd
  505 root      -6  10  2372K  2020K piperd   0:00  0.00%  0.00% php
  110 root      10   0  1096K   796K nanslp   0:00  0.00%  0.00% msntp
  108 root      10   0  1324K   824K wait     0:00  0.00%  0.00% sh
  121 root       3   0  1328K   852K ttyin    0:00  0.00%  0.00% sh
  508 root      10  10  1324K   828K wait     0:00  0.00%  0.00% sh
  507 root      -6   0  2224K  1188K piperd   0:00  0.00%  0.00% mini_httpd
  509 root      57  10  1860K   920K RUN      0:00  0.00%  0.00% top

===

Does anyone have any idea what might be causing this? As I've said, it
seems to happen every two days or so, and doesn't go away until I reboot
m0n0wall (I can't even kill the process). I'm running pb16r500 on a
net4501, for what it's worth.

In case anyone's curious, I also saved a dump from status.cgi while the
problem was happening. Email me if you'd like a peek.

-- 
Ryan Grove
ryan at wonko dot com
http://wonko.com/