Yes, that's 'fastest', not closest.
A while back I published a blog post entitled Finding the closest data center using GeoIP and indexing, which described how we automatically determine the AWS regional data center to be used for storing encrypted server backups.
New and improved
Since the original publication, Amazon built new regional data centers in Oregon, Sao Paulo and Tokyo, so the indexes needed to be updated.
While adding support for the new regions I decided to take it a step further and add some improvements.
Improvement #1: Automatic association (distance)
The method originally used to perform automatic association of countries/states to data centers was lacking some what and needed to be improved.
We are now using the Haversine formula, which is used to determine great-circle distances between two points on a sphere from their longitudes and latitudes.
Improvement #2: Incorporated world wide underwater cables (latency)
Originally we relied on user feedback of connection latency to tweak the indexes. This didn't scale very well, so we needed a way to make it easier.
Based on Gregs Cable map, we could mashup the automatic associations and tweak the index overrides based on expected latency.
It turns out that this was a crucial part of the equasion, as a user might be physically closer to data center X, but in reality the connection to data center Y is faster. For example, previously Australia was allocated to Singapore but has been moved to California as the pipe is much fatter (see the visual map below).
Improvement #3: Open source
We originally published the indexes, but have now open sourced the whole project on github in hope that others might find it useful, and make collaboration easier.
Putting it all together
The below screenshot plots countries/states to their associated AWS regional data centers, and overlays the world wide underwater cables for reference:
Want to zoom in? Toggle active and future cables? Check out the live mashup.