[ previous ] [ next ] [ threads ]
 
 From:  "Bernie O'Connor" <Bernie dot OConnor at sas dot com>
 To:  "Lee Sharp" <leesharp at hal dash pc dot org>, "Jonathan De Graeve" <Jonathan dot DeGraeve at imelda dot be>
 Cc:  "m0n0wall" <m0n0wall at lists dot m0n0 dot ch>
 Subject:  RE: [m0n0wall] Version 1.22 freeze
 Date:  Fri, 21 Jul 2006 08:42:48 -0400
Could it have anything to do with DNS?  Seems like client failures are dependent on DNS working to
resolve the original http request.  Since status.php still works, does diagnostics/ping-traceroute
to a www known address resolve correctly?  This is an interesting problem to solve because you're
working without a known 'computer-agent' that you can rely on to test the internal link during a
failure.  In the wi-fi world, it's always a struggle to remotely determine with a client machine if
it is the client or the infrastructure that is at fault.  Is it possible to script  something  you
can run with exec.php to simulate the client http experience (dns lookup, get the web page, see if
captive portal captures request)?  Or, perhaps the hotel has a computer attached to the network that
we could develop a diagnostic set that captures the failure like Jonathan is asking, or even just a
set of instructions that the hotel person could execute to gather diagnostic information?  All this
could lead to developing some self-test mechanism that m0n0wall can run itself to test its
functionality.  

Seems like we have at least 2 types of failures being discussed, Lee's - where he can still get
information with status.php, and the others, where nothing works -- the console is dead, no http
services at all.  Are there any new reports from anyone with the 'completely dead' type problem, not
that we can help - just curious :)

bernie 

-----Original Message-----
From: Lee Sharp [mailto:leesharp at hal dash pc dot org] 
Sent: Thursday, July 20, 2006 12:50 AM
To: Jonathan De Graeve
Cc: m0n0wall
Subject: Re: [m0n0wall] Version 1.22 freeze

From: "Jonathan De Graeve" <Jonathan dot DeGraeve at imelda dot be>

>>> If the http doesn't work anymore: the mini_httpd still seems to run, 
>>> even if you wait 10seconds the page doesn't show up?

>>IE, Firefox, and Opera all timeout.  The error we get called on is 
>>"the inter net is down."

>Is it possible to do a sniff on a newly connected machine giving me the 
>TCP snifs to actually see what's happening?

I am almost never on site, so this is not easy.  Few random hotel gests have sniffers with them. :-)

>>AFIK, all of the systems doing this are not using RADIUS, but only a 
>>splash page acceptance authentication.  Would this still make a 
>>difference?

>This is actually handfull information and GREAT news for me (sorry ;) ) 
>this means it isn't related to the radius subroutines I wrote which is 
>REALLY good news to me.
>The bad news is, if it doesn't have something todo with the max httpd 
>proc it is something in the code related to local user authentication 
>which then must be broke somewhere. (so I need to check all that code 
>because I didn't write the local user manager stuff)

>You are really 100% sure this only happens on systems WITHOUT radius right?

The problems only occur in heave use hotels, and our paid Radius sites don't have near the use. 
They are set to "no authentication."

>>> Well, it contains bugfixes that ain't incorporated into 1.22, it 
>>> should be more stable ;)

>>I can try it, or wait to see it break again to gather more information.

>Since you are talking about having more then 1site with this behaviour 
>I would suggest upgrading 1 to 1.23b1 and leaving the other to gather 
>more information on this issue.

Since it happens so rarely, I will leave it "broken" and try to gather more information.  Had one
today, but was rebooted before I was called.

                                            Lee