Scaling web sites: a brief overview of tools and strategies

Monolithic server architecture

The easiest way to support additional capacity is to simply use a bigger server (big iron approach), but the disadvantage is you need to pay for that capacity even when you're not using it, and you can't change the amount of capacity for a single server without suffering some downtime (in the traditional monolithic server architecture).

Monolithic server architectures can in fact scale (up to a certain degree), but at a prohibitive price. At the extreme end you get mainframe-types big iron machines.

Distributed server architecture

In a nutshell, by using a bunch of opensource tools, it's possible to build a distributed website with capacity added and removed on the fly on a modular basis.

Load balancer

The keystone in such an arrangement is the load balancer which can operate at either the layer 4 IP level (e.g., Linux Virtual Server project), or at the layer 7 HTTP level (e.g., pound, squid, varnish).

An L4 (layer 4) load balancer is more generic and potentially more powerful than an L7 (layer 7) load balancer in that it can scale further (in some configurations) and support multiple TCP applications (I.e., not just HTTP). On the other hand, a L7 load balancer can benefit from peaking inside the application layer and route requests based on more flexible criteria.

Note that sometimes you don't need a load balancer to scale. In principle, if your application is light enough not to create CPU bottlenecks (e.g., mostly cached content served anonymously) or IO bottlenecks (e.g., your dataset is small enough to fit in memory), you'll be able to max out the network connection with a single machine and a load balancer won't provide much benefit.

The general idea behind using an L7 load balancer / reverse proxy program such as pound is that it allows you to off-load CPU or IO intensive tasks to configurable backends.

Pound is a simple dedicated L7 load balancer. It has no caching capabilities (unlike varnish). Pound detects back-ends that have stopped functioning and stops routing requests to them. This allows us to gracefully scale down capacity because you can remove back-ends and the system continues to run with no interruption.

Since an L7 load balancer is still routing all of the network requests, you can scale this configuration as far as your network connection will allow (e.g., around 20 servers), but if you continue to grow you may eventually max out the connection (which supposedly runs at 250MB/s on EC2).

LVS offers a few configurations (IP tunneling and Direct Request) that can get around this limitation so that the LVS routes requests but allows the back-end nodes to respond directly to the originator of the request. In this case the load balancer could be limited to 100MB/s while the output of the cluster goes beyond 1GB/s.

Of course the load balancer itself can become the central point of failure, and in really high availability scenarios when that matters you can setup the "heartbeat" system to implement automatic failover. I'm not sure heartbeat would work in EC2 (it assumes a local LAN), but since EC2 supports reassignment of an elastic IP to a given machine I bet a similar arrangement could be made to work.

Example architecture (e.g., large zope site):

pound -> bunch of varnish caches -> real web servers with zope -> mysql cluster (LVS offers instructions how to build one)

Varnish caches accelerate the capacity of the web servers they are using as back-ends because they can server static content and cached dynamic content much more quickly than a typical Apache (assuming you can hit the cache of course).

Anyhow, the end result is a sort of virtual server sitting behind one IP address that can scale from 1 machine (e.g., L7 load balancer with web site and mysql database all integrated) to a cluster (behind that same IP address) that maxes out the network connection. How many machines the cluster can have depends on the application's bottlenecks (I.e., CPU or IO) and economics (I.e., price/performance of a few big machines vs a larger amount of small machines)

Scaling further: DNS

When a single cluster isn't enough (e.g., maxed out the full network capacity) you can use DNS scaling tricks (e.., round-robin multiple A hosts) to spread out your load between multiple clusters.

For extra performance you can point users to a geographically close cluster using geolocation features of DNS servers such as PowerDNS (what Wikipedia are using).

BTW, PowerDNS is an opensource DNS server that maintains compatibility with BIND resource zone formats while offering a slueth of new features such as geolocation.

You'll want to keep DNS TTL low so that you can more quickly update the records, but at least some dns clients misbehave and ignore your TTL, which limits your flexibility as:

  1. old clients with cached entries won't see new load balancing cluster

  2. when you remove a load balancing cluster you have to take into account worst case expiration times for the cluster's DNS records before you can remove it offline without letting client's suffer performance degredation.

    Alternatively you can update records and take the machine offline when your monitoring indicate it's no longer being used at any significant level.

In conclusion: after a certain point you're going to have to use DNS based load scaling techniques.

I confirmed this by looking at a few examples of high scale websites (google, yahoo and wikipedia)

Google in particular seems to like using DNS to scale. I looked up www.google.com from 4 different servers around the world and received a different set of IP addresses EVERY time which all seem to be pretty close to the source of origin (I.e., in terms of ping times). Google's DNS servers themselves are a consistent set of IPs though, your location doesn't seem to matter.

Yahoo also serves you a different IP depending on where you are, and they use Akamai's dns services (probably similar to PowerDNS geolocation) to do it.

Add new comment