On Sat, 29 May 2004, Mark Spieth wrote:
> The biggest problem with round robin DNS isn't how it distributes the IP
> addresses but caching when a machine goes down. If you have 3 machines
> in DNS with the same A record and one fails, You would probably be able
> to fix the machine before your TTL expired in DNS and everyone sees that
> you removed the failed machine. This is why the round robin method
> within ipnat is idea so if a machine fails you just remove it from the
> round robin on the monowall box and no DNS updates are required.
I agree that DNS round-robin isn't the best approach to high availability
(though neither is the simple round-robin forwarder), but I was mostly
taking issue with the claim that it necessarily works poorly for
distributing requests among *working* servers.
But note that when an intelligent client has a list of IP addresses to
try, it will try additional addresses when the first one fails, so it will
actually "get past" the failed server, albeit at a cost of additional
delay. Contrast this with the round-robin forwarder, where the client
only sees a single IP and thus only tries once. Unless the forwarder
either keeps track of the "upness" of the servers (and note that there's a
bit of a race there) or acts as a full-fledged proxy doing its own
retries, it will actually do worse than round-robin DNS (at least with
intelligent clients).
Short-term server outages (e.g. from crashes) would typically only last a
few minutes, so the temporary extra delay in round-robin connection
attempts is probably not a major concern. Longer-term outages can get
around the DNS cacheing problem by temporarily reassigning the failed IP
as an alias on (or perhaps remap it via NAT to) a working server (not
exactly "balanced", but adequate as a workaround). *Really* long-term
outages could use the latter as a temporary workaround while waiting for
the DNS removal to trickle through the system.
Fred Wright |