[ previous ] [ next ] [ threads ]
 From:  mtnbkr <waa dash m0n0wall at revpol dot com>
 To:  Manuel Kasper <mk at neon1 dot net>
 Cc:  m0n0wall <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Re: [m0n0wall] 1.3.b11 lockups
 Date:  Fri, 16 May 2008 15:52:34 -0400
Manuel Kasper wrote:
> On 16.05.2008, at 20:01, mtnbkr wrote:
>>     3648 active
> I can't really see anything out of the ordinary in your vmstat/ipfstat 
> output; however, given the strict ruleset that you described, I find 
> 3648 active connections quite a lot. You could try "ipfstat -ls" to see 
> the full list of active connections - maybe you'll see something odd 
> there. Of course if you have lots of users it could also be quite 
> normal. ;)

Yeah, that does seem a bit high, especially since there are less than 
300-400 users on campus.  I'll take a look and see.

You know, I was just about to post a follow-up to my last post... The 
thought that came to mind was that things have been pretty stable over 
there, and that I may have solved the problem a while back without 
getting positive feedback from the resolution(s). The issue just sort of 
quietly faded away and I kept thinking that I was in a holding pattern, 
just waiting for it to happen again...

What I THINK happened was the following:

The email server on the DMZ is configured with djb's dnscache (part of 
his djbdns package).  Putting djb's dnscache on the email server with a 
rather large lookup cache helps lessen the number of dns lookups for the 
RBL lookups etc.

What I THINK may have caused the state table overflows, kernel panics 
and reboots in the past was a combination of the following factors:

- The djbdns cache on the email was FAR too small
- The dnscache program on the email server was configured and allowed to
   make dns requests from the internal dns server
- The internal dns server's lookup cache was ALSO configured to be FAR
   too small

So, what I seem to recall is that with a lot of email coming in, there 
was a FLURRY of DMZ-to-Internal dns requests, each followed by an 
internal-to-Internet dns request to fulfill the email server's 
request... With each dns lookup traversing the m0n0wall twice.

That was an oversight (and a dumb move) on my part when I was in a rush, 
so I docked myself a day's pay.   lol

The box has been pretty stable lately and I am pretty sure that the only 
reboots recently have been due to extended power outages.

> Well, I have to admit that I don't know what to suggest at this point. 
> If the problem is really related to some odd traffic that occurs only 
> once in a while and somehow messes up ipfilter, one way of hopefully 
> getting a bit closer to finding out what it is would be to capture all 
> traffic on the LAN interface of your m0n0wall. Wireshark has a ring 
> buffer feature that allows it to capture indefinitely while consuming a 
> fixed amount of disk space. Then when the problem occurs again, you 
> could correlate the time of the kernel panic with the traffic just 
> before it happened, and hopefully discovery something extraordinary. It 
> could be a lot of data to sift through, though...

Yeah, that is a good idea. Wireshark is a great tool.

BTW, I think the output of the firewall states page of m0n0wall was 
actually key in steering me towards a dns issue being the cause.

Sorry for bothering you today with this (old), apparently solved issue.

This post should probably have been posted as a response/follow-up to my 
March posts so that that thread can be followed to a conclusion if 
others experience a similar situation.

Paypal donation should reach you before this email does.   :)

Bill Arlofski
Reverse Polarity, LLC