TurnKey Linux Virtual Appliance Library

Finding the closest APT package archive using GeoIP and indexing

In preparation for TurnKey's upcoming release based on Ubuntu Lucid 10.04 LTS, we are knocking off todo list items. One of them is code-named auto-apt-archive. As you can guess from its name, the objective is to configure the closest APT package archive mirror, automatically, without user intervention. It does this by leveraging a new GeoIP service provided by the TurnKey Hub.

By using the closest archive, it is usually much faster, will lessen the load on Ubuntu's main package archive which has been the default up until now, and in certain circumstances, cheaper (for example, bandwidth within Amazon EC2 regions is free).

BTW, TurnKey EC2 builds already include a similar optimization, which leverages ec2metadata to get its associated region and construct the URL for the region specific Ubuntu APT archives.

The new auto-apt-archive solution will replace the old Amazon EC2 adhoc solution, but will also be included in all TurnKey builds, whether it be bare-metal, virtual machines, VPS's or cloud deployment.

So how does it work

Firstly, you might recall a post I made last month, with the somewhat similar title Finding the closest data center using GeoIP and indexing. The GeoIP implementation details are similar, so I won't repeat them here.

For those interested in how auto-apt-archive works, it goes something like this:

On firstboot, auto-apt-archive is called by an inithook, which contacts the Hub requesting the closest Ubuntu APT package archive, and updates APT sources lists accordingly.

The Hub looks up the requesting IP address using GeoIP to find the associated country code which is used in the archive URL.

Ubuntu have implemented a wildcard domain configuration for the archive mirrors, making the URL construction really simple. In the case that there is no local APT archive in your country, you will be routed to Ubuntu's main package archive. When one does become available, you'll automatically be routed there.

http://$CC.archive.ubuntu.com/ubuntu
What about Amazon EC2 you ask? Well, the Hub checks if the IP address is associated with an Amazon EC2 instance it launched, and if it does, returns the region specific archive URL.
http://$REGION.archive.ubuntu.com/ubuntu
In the future, when we add more Cloud deployment options to the Hub which have local APT package archives, they will be automatically supported as well.
  
And lastly, don't forget that Debian appliances are in the works, so Debian APT package archives are also supported. Debian haven't implemented wildcard DNS, so the Hub looks up the best archive in an index (similar to the amazon region indexes), and returns the archive URL.
http://ftp.$CC.debian.org/debian
 
Just as with the previous geoip/index post, we need your help to tweak the indexes and mapping logic, as you have better knowledge and experience on your connection latency, and mirror speed. If you think we should associate your country/state to a different archive, please let us know.
You can get future posts delivered by email or good old-fashioned RSS.
TurnKey also has a presence on Google+, Twitter and Facebook.

Comments

For Debian you can just use

For Debian you can just use this for now:

http://cdn.debian.net/
http://wiki.debian.org/DebianGeoMirror

At some point there will be a more sane mirror system along these lines:

http://lists.debian.org/20100906082622.GO25990@anguilla.noreply.org

How well would this work for a local mirror?

I am thinking about running my own ESX or OpenStack/DevStack server in the basement and thought it probably is a good idea to host my own package mirror (although a local cache is also on my radar). Is there any good guides you've seen that would help implement this with Turnkey builds?

Jeremy's picture

I have used apt-cache successfully

Although I think apt-cacher-ng (or similar) is the best/latest iteration of this floating about. There is also apt-proxy but I have never used it. Depending on your plans, a more generic caching proxy may be useful (like using Squid to cache any downloads).

Finally, if you want to mirror a whole repo, I recall reading about numerous scripts floating about that can set that up. However personally I think that is overkill...

This is from

This is from apt-cacher-ng:

Apt-Cacher NG has been designed from scratch as a replacement for apt-cacher, but with a focus on maximizing throughput with low system resource requirements. It can also be used as replacement for apt-proxy and approx with no need to modify clients' sources.list files.

I am actually having two conflicting thoughts. One is running a VM with Apt-Cacher NG and one is running a Raspberry Pi with a portable USB drive and running Apt-Mirror. With the latter I could take it to places that have no connection or a slow Wi-Fi connection with a small pipe and still have decent install speeds. Hmmmmm.

Alon Swartz's picture

Take a look at polipo

Polipo is a very lightweight, well designed and generic proxy/cacher. I've been using it for both regular file downloading caching as well as APT caching and has been performing very nicely.

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)