[ previous ] [ next ] [ threads ]
 From:  "David Burgess" <apt dot get at gmail dot com>
 To:  "Monowall Support List" <m0n0wall at lists dot m0n0 dot ch>
 Subject:  Re: io errors -- trouble?
 Date:  Thu, 9 Aug 2007 16:59:09 -0600
On 8/9/07, David Burgess <apt dot get at gmail dot com> wrote:
> I'm troubleshooting a mono 1.3b2/3 box that was suspected of giving us
> intermittent outages on a small percentage of our LAN clients. I
> mentioned it in another of my posts yesterday, but I'm starting a new
> thread today as the problem-solving process has taken a new direction.
> The hardware as I'm testing it, 8 months old:
> sempron 1.6 GHz
> 256 MB RAM
> WAN - nvidia onboard GBE (nve driver)
> LAN - intel pro 1000 gt pci (em)
> I've tested the RAM thoroughly with no errors and now I'm running
> iperf through it on a gigabit connection.
> If I run iperf in full duplex mode with a single connection each way,
> things go fine. But if I run it with any parallel connections I get
> errors. For example, if I do a 10-second full duplex test:
> iperf -c -i 10 -d -t 10 -P 8
> I get in/out errors on my WAN interface. This is after a fresh reboot
> and a single 10-second run on iperf as quoted above:
> Media    1000baseTX <full-duplex>
> In/out packets  448772/535650 (377.81 MB/613.10 MB)
> In/out errors   73/0
> It doesn't actually appear to matter how many connections I use beyond
> 1, a ten-second test will typically produce 25-100 in/out errors on
> the WAN interface.
> I realise that we're talking about less than 1% of packets in error
> here, as pointed out by cmb and probably others on the list in the
> past, but I'm looking for anything here, as we've definitely had
> transient problems in our network. I'm also concerned that I've never
> seen errors on this mono before while deployed, and yes, I do check.
> FWIW, our firewall states table had typically 5000-6000 entries in it
> when the problem began occurring.
> So should I suspect this nic? Any other theories out there? All
> suggestions appreciated.
> db

Things are not looking good for my Intel nic. First, a minor
correction from my first post. em0 was actually the WAN and the card
showing the errors in the above description.

Now it's not showing up in the interfaces status page at all. The
interfaces assign page only gives nve0 as an assignable interface in
the dropdown menu.

The following is a relevant snippet from the system log at boot time:

Aug 9 22:04:05 	kernel: pci4: <ACPI PCI bus> on pcib4
Aug 9 22:04:05 	kernel: em0: <Intel(R) PRO/1000 Network Connection
Version - 6.2.9> at device 8.0 on pci4
Aug 9 22:04:05 	kernel: em0: failed to enable memory mapping!
Aug 9 22:04:05 	kernel: em0: Unable to allocate bus resource: memory
Aug 9 22:04:05 	kernel: em0: Allocation of PCI resources failed
Aug 9 22:04:05 	kernel: device_attach: em0 attach returned 6

If I got this all the time I would look for a kooky bios setting, but
considering it appeared to be functioning right up until the last
reboot, I'm going to say the card is probably bad.

Nevertheless I'm still looking for two cents from anybody that cares
to throw it in, as I've had a long two days of troubleshooting and
lots of complicating factors, so I'm not really eager to close on any
conclusion just yet.

Any hypotheses gratefully considered.