TKLBAM hooks

This document page is based on this blog post.

TKLBAM has a nifty, general purpose hooks mechanism you can use to trigger useful actions on backup and restore.

Examples of hooks:

  • Cleaning up temporary files
  • Stopping/starting services to increase data consistency
  • Encoding/decoding data from non-supported databases
  • Using LVM to create/restore a snapshot of a fast changing volume

The hooks mechanism was originally developed so we could fix a few issues indirectly related to the usability of TKLBAM. In particular, our very first beta users reported that sometimes tklbam-restore would fail to find any backup volumes. When we investigated this turned out to be a clock discrepancy. The obvious solution was to sync the clock before starting the restore, but the more I thought about it the more the idea of hardwiring that ntpdate stuff rubbed me the wrong way. For a few reasons:

  • It's an auxiliary problem, not a core issue with TKLBAM's logic
  • I'm offline much of the time during development so I needed some way to turn this off, but I don't want to add more testing-specific code unless it's absolutely necessary.
  • It's OK if a specific server (e.g., pool.ntp.org) is the default, but there should be some way to configure it if a user, for example, wants to use an internal NTP server.

I tried to think of a clean way to achieve these simple goals in a clean way (e.g., cli options, environment variables, configuration files), but everything I came up with was just so darn ugly.

Then I realized that a hooks mechanism would solve this problem in a simple, generic way.

Implementation

/etc/tklbam/hooks.d may contains executables (e.g., scripts) that will be run by tklbam before and after two operations (currently):

  1. backup
  2. restore

Two arguments are passed to the hooks:

  1. operation: restore/backup
  2. state: pre/post

Non zero exitcodes raise a HookError.

Advantages

In one stroke, solve the clock problem and also lets advanced users define their own hooks to take care of things TKLBAM doesn't (e.g., stopping IO intensive processes before backup, encoding/decoding unsupported databases, etc.)

Example fixclock hook

import os
import sys
import executil
from string import Template

NTPSERVER = os.environ.get("NTPSERVER", "pool.ntp.org")

ERROR_TPL = """\
##########################
## FIXCLOCK HOOK FAILED ##
##########################

Amazon S3 and Duplicity need a UTC synchronized clock so we invoked the
following command::

    $COMMAND

Unfortunately, something went wrong...

$ERROR
"""

def fixclock():
    command = "ntpdate -u " + NTPSERVER

    try:
        executil.getoutput(command)
    except executil.ExecError, e:
        msg = Template(ERROR_TPL).substitute(COMMAND=command,
                                             ERROR=e.output)

        print >> sys.stderr, msg,
        sys.exit(1)

def main():
    op, state = sys.argv[1:]

    if op in ('restore', 'backup') and state == 'pre':
        fixclock()

if __name__ == "__main__":
    main()

For additional info, discussion and commentary please see the original blog post by Liraz.