|
||||||||
For the last few weeks I've been battling an intermittent loss of connectivity approximately every two days or so while using m0n0wall. It happened again today, and I had the presence of mind this time to do some digging to try and see what was going on. What I found was that dnsmasq was pegging the CPU. While this was happening, none of my LAN machines could access the Internet. Here's the output from top showing the runaway process: === last pid: 509; load averages: 1.04, 1.02, 1.00 up 2+08:23:03 22:30:04 16 processes: 2 running, 13 sleeping, 1 zombie Mem: 4992K Active, 3568K Inact, 4948K Wired, 12K Cache, 4592K Buf, 37M Free Swap: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 266 nobody 83 10 948K 724K RUN 24.7H 95.07% 95.07% dnsmasq 69 root 10 0 1448K 1212K nanslp 0:34 0.00% 0.00% ipmon 59 root 2 0 1432K 1084K select 0:33 0.00% 0.00% dhclient 77 root 2 0 980K 708K select 0:10 0.00% 0.00% syslogd 271 root 2 10 1780K 1456K select 0:05 0.00% 0.00% dhcpd 33 root 10 0 880K 536K nanslp 0:02 0.00% 0.00% watchdogd 103 root 10 0 944K 660K nanslp 0:01 0.00% 0.00% ez-ipupdate 96 root 2 0 2220K 1084K accept 0:00 0.00% 0.00% mini_httpd 505 root -6 10 2372K 2020K piperd 0:00 0.00% 0.00% php 110 root 10 0 1096K 796K nanslp 0:00 0.00% 0.00% msntp 108 root 10 0 1324K 824K wait 0:00 0.00% 0.00% sh 121 root 3 0 1328K 852K ttyin 0:00 0.00% 0.00% sh 508 root 10 10 1324K 828K wait 0:00 0.00% 0.00% sh 507 root -6 0 2224K 1188K piperd 0:00 0.00% 0.00% mini_httpd 509 root 57 10 1860K 920K RUN 0:00 0.00% 0.00% top === Does anyone have any idea what might be causing this? As I've said, it seems to happen every two days or so, and doesn't go away until I reboot m0n0wall (I can't even kill the process). I'm running pb16r500 on a net4501, for what it's worth. In case anyone's curious, I also saved a dump from status.cgi while the problem was happening. Email me if you'd like a peek. -- Ryan Grove ryan at wonko dot com http://wonko.com/ |