[ previous ] [ next ] [ threads ]
 From:  Fred Wright <fw at well dot com>
 To:  m0n0wall at lists dot m0n0 dot ch
 Subject:  RE: [m0n0wall] Load Balancing Again...
 Date:  Sat, 29 May 2004 16:12:25 -0700 (PDT)
On Sat, 29 May 2004, Mark Spieth wrote:

> The biggest problem with round robin DNS isn't how it distributes the IP
> addresses but caching when a machine goes down. If you have 3 machines
> in DNS with the same A record and one fails, You would probably be able
> to fix the machine before your TTL expired in DNS and everyone sees that
> you removed the failed machine. This is why the round robin method
> within ipnat is idea so if a machine fails you just remove it from the
> round robin on the monowall box and no DNS updates are required.

I agree that DNS round-robin isn't the best approach to high availability
(though neither is the simple round-robin forwarder), but I was mostly
taking issue with the claim that it necessarily works poorly for
distributing requests among *working* servers.

But note that when an intelligent client has a list of IP addresses to
try, it will try additional addresses when the first one fails, so it will
actually "get past" the failed server, albeit at a cost of additional
delay.  Contrast this with the round-robin forwarder, where the client
only sees a single IP and thus only tries once.  Unless the forwarder
either keeps track of the "upness" of the servers (and note that there's a
bit of a race there) or acts as a full-fledged proxy doing its own
retries, it will actually do worse than round-robin DNS (at least with
intelligent clients).

Short-term server outages (e.g. from crashes) would typically only last a
few minutes, so the temporary extra delay in round-robin connection
attempts is probably not a major concern.  Longer-term outages can get
around the DNS cacheing problem by temporarily reassigning the failed IP
as an alias on (or perhaps remap it via NAT to) a working server (not
exactly "balanced", but adequate as a workaround).  *Really* long-term
outages could use the latter as a temporary workaround while waiting for
the DNS removal to trickle through the system.

					Fred Wright