[ previous ] [ next ] [ threads ]
 From:  "Lee Sharp" <leesharp at hal dash pc dot org>
 To:  "Jonathan De Graeve" <Jonathan dot DeGraeve at imelda dot be>
 Cc:  "m0n0wall" <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Re: [m0n0wall] Version 1.22 freeze
 Date:  Wed, 19 Jul 2006 14:53:40 -0500
From: "Jonathan De Graeve" <Jonathan dot DeGraeve at imelda dot be>

> > On a fresh power on, everything works normal.  I am not sure how the
> > problem
> > develops as I am not aware until we get calls.  At that point, a new
> > client
> > will get an IP via DHCP fine.  However, an http request will just time
> > out.
> > Clients already authenticated (Like the business center PC, often
> enough)
> > will never time out, even with the idle timeout set.  They, and
> systems
> > with
> > "allowed IP addresses" will work fine.

> Which means something is holding it up, which CPU type/memory
> (performance wise)?

All systems have 256 meg, and strong CPU.  All system web consoles are 
responsive, and all logs viable.  You can not tell anything is wrong from 
the console.

> Can you confirm if this only happens on systems with more then
> 50concurrent user logins?

No.  However, now that local managers know a reboot fixes it, it is very 
hard to get "before" snapshots. :-(

> If the http doesn't work anymore: the mini_httpd still seems to run,
> even if you wait 10seconds the page doesn't show up?

IE, Firefox, and Opera all timeout.  The error we get called on is "the 
inter net is down."

> You can try to higher the max number of concurrent sessions from default
> 16 to 32 for mini_httpd in the config.

How is this done?

> Also you can try to kill the minicron ' /usr/local/bin/minicron 60
> /var/run/minicron.pid /etc/rc.prunecaptiveportal' process and to start
> it back up manual using the exec and instead of 60 use 300 (= 300sec) If
> there is a huge number of users and the radius is slow it can happen
> that the timeout values are a little bit too high and that the m0n0wall
> isn't able to update all accounting within a 60sec interval, especially
> when the radius is configured with a delay if has to answer with a
> Access-Reject packet.
> You can change this behaviour by changing the config and adding a
> 'hidden' option key to the captiveportal section:
> <croninterval>300</croninterval>

AFIK, all of the systems doing this are not using RADIUS, but only a splash 
page acceptance authentication.  Would this still make a difference?

> Since for the rest everything is still accessible there isn't a problem
> with buffers.

Confusing, isn't it? :-)

> > How stable is it?  This problem only occurs in heavily used production
> > systems.  If you wish, I can give you access to the to locations that
> have
> > done this the most.

> You are talkin gabout heavily used: how many users at min. ?

All are in hotels.  From 0 to 10 users can come on any time.  Sometimes many 
more.  Mostly porn or music downloads.  Light VPN outbound use from clients 
in the LAN from 5:30 to about 8:00 then all porn. :-)  Business users...

> Well, it contains bugfixes that ain't incorporated into 1.22, it should
> be more stable ;)

I can try it, or wait to see it break again to gather more information.  PS: 
1.23b1 doesn't have a "both" checkbox, does it? ;-)