We are about to release the TurnKey Linux Backup and Migration (TKLBAM) mechanism, which boasts to be the simplest way, ever, to backup a TurnKey appliance across all deployments (VM, bare-metal, Amazon EC2, etc.), as well as provide the ability to restore a backup anywhere, essentially appliance migration or upgrade.
Note: We'll be posting more details really soon - In this post I just want to share an interesting issue we solved recently.
Backups need to be stored somewhere - preferably somewhere that provides unlimited, reliable, secure and inexpensive storage. After exploring the available options, we decided on Amazon S3 for TKLBAM's storage backend.
Amazon have 4 data centers called regions spanning the world, situated in North California (us-west-1), North Virginia (us-east-1), Ireland (eu-west-1) and Singapore (ap-southeast-1).
The problem: Which region should be used to store a servers backups, and how should it be determined?
One option was to require the user to specify the region to be used during backup, but, we quickly decided against polluting the user interface with options which can be confusing, and opted for a solution that could automatically determine the best region.
The below map plots the countries/states with their associated Amazon region:
Generated automatically using Google Maps API from the indexes.
The solution: Determine the location of the server, then lookup the closest Amazon region to the servers location.
Part 1: GeoIP
This was the easy part. The TurnKey Hub
is developed using Django which ships with GeoIP
support in contrib. Within a few minutes of being totally new to geo-location I had part 1 up and running.
When TKLBAM is initialized and a backup is initiated, the Hub is contacted to get authentication credentials and the S3 address for backup. The Hub performs a lookup on the IP address and enumerates the country/state.
In a nutshell, adding GeoIP support to your Django app is simple: Install Maxmind's C library
and download the appropriate dataset
. Then, once you update your settings.py file, you're all set.
GEOIP_PATH = "/volatile/geoip"
GEOIP_LIBRARY_PATH = "/volatile/geoip/libGeoIP.so"
from django.contrib.gis.utils import GeoIP
ipaddress = request.META['REMOTE_ADDR']
g = GeoIP()
'longitude': - 74.497703,
Part 2: Indexing
This part was a little more complicated.
Now that we have the servers location, we can lookup the closest region. The problem is creating an index of each and every country in the world, as well as each US state - and associating them with their closest Amazon region.
Creating the index could have been really pain staking, boring and error prone if doing it manually - so I devised a simple automated solution:
Generate a mapping of country and state codes with their coordinates (latitude and longitude).
Generate a reference map of the server farms coordinates.
Using a simple distance based calculation, determine the closest region to each country/state, and finally output the index files.
I was also planning on incorporating data about internet connection speeds and trunk lines between countries, and add weight to the associations, but decided that was overkill.
More importantly, we need your help to tweak the indexes - as you have better knowledge and experience on your connection latency and speed. Please let us know if you think we should associate your country/state to a different Amazon region.
We updated the indexes to include the new AWS regions (Oregon, Sao Paulo, Tokyo), tweaked automatic association to use the haversine formula, and added overrides based on underwater internet cables. Lastly, we've open sourced the whole project on github
(checkout the live map meshup