A few days ago I noticed the average load on a newly setup VPS was too high relative to its workload. The server was noticeably sluggish. Load was high and a page the wiki it was running would occasionally take 10 seconds to load.
I didn't have too much time to kill but I decided I would at least try picking off the lowest hanging fruit before upgrading to a beefier, more expensive plan which would cost a few hundred dollars extra a year.
When I noticed swap usage was relatively high, I suspected the server was constantly moving memory in and out of swap.
One of the first things you should try when you're tuning the performance of a server is to reduce memory usage so that it doesn't use as much swap and can use more of your memory for disk buffering. Accessing buffered blocks from memory is a gazillion times faster than accessing them from disk.
So how do you reduce memory usage without reducing functionality? By tuning the trade off between memory and CPU time.
One example of this trade off in practice is pre-forking, which is a common technique used by many systems that can improve user-perceived performance. The principle is that by starting the process in advance of when it's needed then that bit of overhead is not experienced by the user who would otherwise need to wait for the process to start up before it can serve him.
The catch is that keeping spare processes around requires memory.
When you have a limited amount of real memory, then any performance gains from pre-forking can quickly evaporate due to reduced IO performance.
On this server, we didn't really have that much activity for things like mail or web, but many subsystems were configured (by default) to pre-fork at least 5 worker processes in advance.
This is a reasonable configuration for a dedicated server with lots of memory and hundreds of users, but not ideal for that cheap VPS you just setup for your workgroup.
I reduced memory consumption by over 200MB by simply decreasing the number of preforked worker processes for apache (20-25MBs per process), spamassassin (30MBs per process), and saslauthd. Average load dropped immediately by about 80%.
Total time spent picking this low hanging fruit: 5 minutes.