Blog Tags: 

Backups are hard, making sure you got it right - harder

According to Murphy's Law, everything that can go wrong, eventually will go wrong.

This is true for backups on multiple levels. A backup is often our last line of defense when things go wrong, but so many things can go wrong with the backup itself that we usually don't find out about it until, well, horror of horrors, the backup fails.

On the surface, backups can fail for zillions of reasons.

If you're ahead of the game you can probably think of at least a dozen reasons why your backups will fail exactly when you need them the most. If you can't, you've most likely been lulled into a false sense of security. You don't even know what you don't know.

But these are just symptoms of a deeper underlying problem. Yes, backups are hard to get right, but the real stinger is that they're even harder to test. This is because restoring a system from backup into production is usually a labor intensive, error prone, time consuming exercise in pain.

In the real world, it wouldn't matter if things go wrong, if (and that's a big if) we could easily simulate the worst case scenario on demand: restore our systems from backup and verify that everything works - or verify that they don't, fix it, rinse-repeat.

Bottom line? Few test their backups, and nobody tests them frequently enough.

Sure, if you have the resources (not many do), you can bruteforce your way around the problem. For example, with the right setup you could do frequent bit-for-bit identical snapshots of the underlying storage media and send them safely off site via a very high-bandwidth network connection (and vice versa). But such bruteforce backup strategies are hugely inefficient and often impractical due to costs.

In the words of Mat Kearney - what is a boy to do?



Add new comment