TurnKey Linux Virtual Appliance Library

How does TKLBAM create such fast and small incrementals of MySQL?

jeremiah.snapp's picture

I can hardly believe how well TKLBAM does it's incremental backups but I did a restore and sure enough everything is there.  What really amazes me is how small the incremental is and how fast it is completed.

When I do a manual mysqldump of all my databases I get an uncompressed file that is about 300MB in about 15 seconds.  An entire TKLBAM incremental backup of my 12GB server including the MySQL databases takes about 30 seconds.  I know that the incremental will only backup changes since the last backup but *still* combining the time to calculate the differences *and* backup the databases *seems* like it should take longer.

Isn't the MySQL backup done with mysqldump?  If so is TKLBAM just looking at the changes from the previous mysqldump to the current mysqldump?

No matter how you've coded TKLBAM I'm impressed.  It's very satisfying to have such a simple but complete backup solution.  I can hardly wait to see LVM incorporated so we can get point-in-time consistent backups.  My guess is that LVM will be a difficult thing to incorporate so good luck with the work.

Here is the tklbam-backup log entry from my most recent incremental backup.  If I'm reading it correctly then it took only 34 seconds to find 8.29MB of changed content and compress it to 2.50MB and save it on the backup server.

ElapsedTime 34.41 (34.41 seconds)
SourceFiles 71459
SourceFileSize 8288062089 (7.72 GB)
NewFiles 3255
NewFileSize 10106497 (9.64 MB)
DeletedFiles 104
ChangedFiles 3395
ChangedFileSize 333924770 (318 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 6754
RawDeltaSize 8689094 (8.29 MB)
TotalDestinationSizeChange 2623269 (2.50 MB)

Liraz Siri's picture

How TKLBAM serializes MySQL database contents

You're right. Calculating incremental changes in a 300MB mysqldump would take longer, and would be much less efficient then what TKLBAM does. TKLBAM converts mysqldump output on the fly into a special filesystem structure. Databases are mapped to directories. Tables are text files and each row is a line. Duplicity uses the rdiff. It's simple and works well with the rsync algorithm Duplicity uses.

You can inspect this file structure by running tklbam-backup --simulate. It will leave behind the /TKLBAM directory. Look inside /TKLBAM/myfs.

For the exact details you should read the mysql.py code.

Jeremiah's picture

  Very clever and, as I

 

Very clever and, as I said before, impressive.  I'm still amazed by the speed given that a mysqldump is still being performed and converted on the fly to this myfs structure.  Nice work.

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)