Imminent Death of the Net Predicted
To improve our resilience, GTM is configured to return two A records for each DNS request for hostname www.<domain>.co.uk, with equal weighting between the three sites. This is to enable some web browsers to fail over more quickly to another site if one site fails for some reason. We don't try to do anything clever like sending the user to the site that will give them the quickest response, since most of our users are in the UK and geography's not much of an issue.
For some time now, we've been noticing higher traffic at site 1 and lower traffic at site 3, and the difference has been slowly increasing. This is odd, since the load should be equally balanced between the three sites.
Further analysis of log files then showed that the problem was only affecting Windows Vista users (30% of our total traffic). Other users showed the same performance and traffic at all three sites.
Googling for Vista network performance issues turned up a big red herring about TCP window scaling, which Vista implements for the first time in Windows, and can cause performance issues with some routers. This was still hard to use as an explanation, given that users on site 1 had good performance, but users coming to site 3 web servers through site 1, using the same router, had poor performance.
So as an experiment, we took site 3 out of the DNS pool altogether for a day. All DNS lookups now returned the addresses for sites 1 & 2. Suddenly, site 2 was just as bad as site 3 had been -- its total number of pages to Vista users went down rather than up, even though its total traffic was up by nearly 50%.
This suggested strongly that for some reason Vista was preferring site 1 to site 2 or 3, and site 2 to site 3, when choosing an IP address from the round-robin A records presented to it. Some more Googling eventually found RFC3484, which relates to DNS resolution in IPV6, but part of which is back-ported to IPV4. Vista is apparently the first major client OS to implement it, specifically section 6 rule 9. That specifies that the selection of an address from multiple A records is no longer random, but instead the destination address which shares the most prefix bits with the source address is selected, presumably on the basis that it's in some sense "closer" in the network.
Now, this may well make sense in IPV6 (I don't know enough about it to comment), but it's an insane algorithm to use in IPV4. First, the Internet is not laid out that way. As any comic artist can tell you, Europe does have a nice block from 220.127.116.11 to 18.104.22.168, but it also has chunks from 193-195 and 212-213, plus there's lots of geographically random stuff between 128 and 172.
But second, and more important, very few Windows client PCs actually have public IP addresses. If you're behind a NAT gateway, the DNS client in your Windows PC doesn't know the IP address you're using on the Internet, just the local network address you're using in one of the ranges specified by RFC1918. Now, in theory, that could be in 10.0.0.0/8, 172.16.0.0/12 or 192.168.0.0/16, but in practice nearly all home routers allocate addresses in the 192.168 range. As it happens, that shares two prefix bits with our site 1 address, one bit with our site 2 address and 0 bits with our site 3 address, so any Vista PC on a home network will always prefer site 1 over sites 2 or 3, and site 2 over site 3. This explains the difference in traffic volumes. A user with a slow and dodgy connection may have pages timeout, at which point their browser sends them to another IP address, so those users who have inherently worse performance are much more likely to find their way to site 3. Also, the few remaining dialup users actually have public IP addresses, which may well be in the European range from 22.214.171.124 to 126.96.36.199, which shares the most prefix bits with site 3 and thus is more likely to go to site 3. These factors explain the poor performance we saw at site 3.
So we're going to have to take a slight hit to our resilience and reduce the number of A records we return for a DNS lookup to one instead of two. This will be affecting other large multi-site websites as well -- for example, www.google.com returns three IP addresses in different ranges. And Microsoft have broken the Internet. Again. Although, to be fair, they did have some help this time from the IETF.
(I found this from a discussion on the Debian mailing list about the implementation of RFC3484 in glibc in Debian Etch. They eventually backed it out and only used section 6 rule 9 for destination addresses on the same subnet, which seems like a much better way to do it.)