[ previous ] [ next ] [ threads ]
 
 From:  Dan DeRemer <dderemer at atnetplus dot com>
 To:  "waa dash m0n0wall at revpol dot com" <waa dash m0n0wall at revpol dot com>, Manuel Kasper <mk at neon1 dot net>
 Cc:  m0n0wall <m0n0wall at lists dot m0n0 dot ch>
 Subject:  RE: [m0n0wall] 1.3.b11 lockups
 Date:  Tue, 20 May 2008 08:52:55 -0400
This is sort of unrelated but should be considered when dealing with ALIX -- most ALIX boards I've
purchased came with BIOS version 0.98. The latest revision is 0.99 and resolves some PCI bus and
serial port issues. I know for a fact that the 1.2 releases of pfSense do not work on ALIX boards
until you upgrade to 0.99. You can find the BIOS releases for the ALIX2/ALIX3 here:
http://pcengines.ch/alix2.htm

Also, here is the changelog:

ALIX tinyBIOS revision history
------------------------------

v0.99           pd 071210       Setup: changed description from Etherboot to PXE boot

v0.98j  pd 071206       Setup: add late PCI init option to support
                        FPGA based miniPCI cards that take a long time
                        to wake up... (symptom: no interrupt assigned)

v0.98h  pd 071203       Change to BIOS controlled PXE module

v0.98g  pd 071126       Fix CS5536 serial port flow control

v0.98f  pd 071126       Disable audio section

v0.98e  pd 071125       Serial console: allow Int 14 init to disable interrupt.
                        Setup: add UDMA option

v0.98d  pd 071115       Setup: change HDD slave to V (avoid accidental change)
                        Setup: add MFGPT workaround option

v0.98c  pd 071113       Alternate version without MFGPT reset

v0.98b  pd 071101       Fix UART initialization

v0.98           pd 071031       Skip DLL status check

v0.97           pd 071026       Back to 400 MHz DRAM clock for ALIX.3*2

v0.96           pd 071025       Always do HDD wait if enabled

v0.95           pd 071024       Use 333 MHz DRAM clock for ALIX.3*2

v0.94           pd 071023       Force MFGPT timer reset (undocumented MSR 5140002B per
                        workaround in AMD Linux driver)

                        Fixed a bug in PCI BIOS find device function

                        Auto detect DRAM clock to set correct refresh interval

v0.93           pd 071021       Add port 92 reset support

                        Setup: add 19200 baud option

v0.92           pd 071003       Add HDD wait option, adds some delay to allow
                        detection of conventional HDDs.

                        Disable CS5536 diverse device power management
                        to avoid MFGPT / interrupt issues.

                        MFGPT issues: please observe AMD CS5536 data book
                        section 5.16.3, incorrect initialization sequence
                        will HANG the system.

v0.90           pd 070925       Remap audio and USB interrupts to offload regular
                        PCI interrupts.

                        IRQ7 is no longer directed to the LPC bus, used
                        as a default interrupt for MFGPT high resolution timer.

                        Implement BIOS setup. Press S during memory test
                        to enter.

                        Add UMB (upper memory block) support.

ALIX / tinyBIOS quirks
----------------------

A20 gate

        A20 gate is always "open", prefer performance over support for
        broken legacy code.

HDD master / slave

        To reduce boot time, slave drives are not detected by default.
        Change the option in setup if required.

HDD wait

        Hard disk drives need more time to wake up, enable HDD wait in
        setup if necessary.

LPT IRQ

        IRQ7 is intentionally unmapped to allow use for MFGPT high speed
        timer.

PXE boot

        Use setup to enable, or press N during memory test to select
        network boot for this startup.

Reboot

        Best method to reboot ALIX.2 / ALIX.3 boards is to use either port 92
        or the dedicated reset registers in CS5536.

RTC wake-up

        One customer reported strange behavior on ALIX.1C, set wake-up
        time to 999999 if problems occur.

UMB

        To support UMB (upper memory block), unused shadow RAM between
        C000 and E000 is left read/writeable.

Open issues
-----------

HDD support

        tinyBIOS does not include large HDD support (> about 40 GB) yet.

PCI boot ROMs

        Not handled correctly by tinyBIOS.

PCI bridges

        tinyBIOS bridge support is questionable, if in doubt send PCI dump +
        maybe sample hardware to PC Engines.

Xmodem upload

        The flash loader has not been ported yet.

VGA

        ALIX.1C tinyBIOS does not support video. Use Award BIOS for this.

Flash layout for ALIX                                           pd 070921
---------------------

The layout is controlled by the batch files used to build the BIOS,
for example lx3.bat.

00000 - 0FFFF   Config block (only first few bytes used, but the flash device
                has 64KB erase blocks)

10000 - 3FFFF   unused

40000 - 47FFF   unused / video BIOS (future use)

48000 - 5FFFF   unused

60000 - 6FFFF   PXE BIOS

70000 - 77FFF   SMI module

78000 - 78FFF   unused, space for runtime copy of config block

79000 - 7FFFF   tinyBIOS core

Memory layout for ALIX
----------------------

00000 - 9FFFF   RW      base 640K RAM

A0000 - BFFFF   -       unused / VGA memory

C0000 - C7FFF   RO      unused / video BIOS

C8000 - DFFFF   -       unused

E0000 - EFFFF   RW      PXE BIOS

F0000 - F7FFF   RW      SMI module

F8000 - F8FFF   RO      runtime copy of config block

F9000 - FFFFF   RO      tinyBIOS core

PCI Interrupt map
-----------------

Please note that ALIX.2A / ALIX.3A boards have a different mapping, please
ask for specific BIOS images for these boards.

PCI dev AD line Int map         Description

00      ..      -               unused

08      AD11    INTA            Geode LX host bridge (crypto)

10..40  12..18  -               unused

48      AD19    INTB            LAN1 (right)

50      AD20    INTC            LAN2 (middle)

58      AD21    INTD            LAN3 (left)

60      AD22    INTA, INTB      miniPCI 1

68      AD23    -               unused

70      AD24    INTC, INTD      miniPCI2

78      AD25    INTA .. INTD    Geode CS5536

80..F8  ..      -               unused

Interrupt map
-------------

IRQ0    timer
IRQ1    KBD (LPC)
IRQ2    cascade
IRQ3    COM1 serial (internal / LPC)
IRQ4    COM2 serial (LPC)
IRQ5    audio (CS5536)
IRQ6    FDC (LPC)
IRQ7    spare, used for MFGPT high resolution timer

IRQ8    RTC
IRQ9    PCI INTA
IRQ10   PCI INTB
IRQ11   PCI INTC
IRQ12   PCI INTD
IRQ13   floating point
IRQ14   IDE HDD
IRQ15   USB (CS5536)

Dan DeRemer
IT Specialist
AtNetPlus, Inc.
www.AtNetPlus.com/address

Keep Connected + Keep Secure + Keep Working

-----Original Message-----
From: mtnbkr [mailto:waa dash m0n0wall at revpol dot com]
Sent: Friday, May 16, 2008 3:53 PM
To: Manuel Kasper
Cc: m0n0wall
Subject: Re: [m0n0wall] 1.3.b11 lockups

Manuel Kasper wrote:
> On 16.05.2008, at 20:01, mtnbkr wrote:
>
>>     3648 active
>
> I can't really see anything out of the ordinary in your vmstat/ipfstat
> output; however, given the strict ruleset that you described, I find
> 3648 active connections quite a lot. You could try "ipfstat -ls" to see
> the full list of active connections - maybe you'll see something odd
> there. Of course if you have lots of users it could also be quite
> normal. ;)

Yeah, that does seem a bit high, especially since there are less than
300-400 users on campus.  I'll take a look and see.

You know, I was just about to post a follow-up to my last post... The
thought that came to mind was that things have been pretty stable over
there, and that I may have solved the problem a while back without
getting positive feedback from the resolution(s). The issue just sort of
quietly faded away and I kept thinking that I was in a holding pattern,
just waiting for it to happen again...



What I THINK happened was the following:

The email server on the DMZ is configured with djb's dnscache (part of
his djbdns package).  Putting djb's dnscache on the email server with a
rather large lookup cache helps lessen the number of dns lookups for the
RBL lookups etc.

What I THINK may have caused the state table overflows, kernel panics
and reboots in the past was a combination of the following factors:

- The djbdns cache on the email was FAR too small
- The dnscache program on the email server was configured and allowed to
   make dns requests from the internal dns server
- The internal dns server's lookup cache was ALSO configured to be FAR
   too small

So, what I seem to recall is that with a lot of email coming in, there
was a FLURRY of DMZ-to-Internal dns requests, each followed by an
internal-to-Internet dns request to fulfill the email server's
request... With each dns lookup traversing the m0n0wall twice.

That was an oversight (and a dumb move) on my part when I was in a rush,
so I docked myself a day's pay.   lol

The box has been pretty stable lately and I am pretty sure that the only
reboots recently have been due to extended power outages.


> Well, I have to admit that I don't know what to suggest at this point.
> If the problem is really related to some odd traffic that occurs only
> once in a while and somehow messes up ipfilter, one way of hopefully
> getting a bit closer to finding out what it is would be to capture all
> traffic on the LAN interface of your m0n0wall. Wireshark has a ring
> buffer feature that allows it to capture indefinitely while consuming a
> fixed amount of disk space. Then when the problem occurs again, you
> could correlate the time of the kernel panic with the traffic just
> before it happened, and hopefully discovery something extraordinary. It
> could be a lot of data to sift through, though...

Yeah, that is a good idea. Wireshark is a great tool.

BTW, I think the output of the firewall states page of m0n0wall was
actually key in steering me towards a dns issue being the cause.

Sorry for bothering you today with this (old), apparently solved issue.

This post should probably have been posted as a response/follow-up to my
March posts so that that thread can be followed to a conclusion if
others experience a similar situation.


Paypal donation should reach you before this email does.   :)


--
Bill Arlofski
Reverse Polarity, LLC