Rick Kipfer's picture

I'd like to start of by saying TKL is awesome!! the TKLBAM has worked flawlessly during all of my tests, but now I think I may have broken it. :o)

I just tried a backup of a test database to see how tklbam would handle it. A database with a simple table of about 7 million records, a single field in each. (about 1 gig with index)

I used ssh to run the tklbam-backup command. 

Everything seemed to be going fine, saw some pretty great CPU load for about 30 minutes or so and then my putty died from a dropped connection (but my other ssh connection held fast to watch load with htop). The load continued for a little while (I'm pretty sure the dropped connection didn't end the process) and dropped off to minimal. And that was it! I  have the hub backup panel saying my server has its "first backup in progress", and there is no tklbam process running on the machine at all now. The command tklbam-list gives...


root@lamp /# tklbam-list
# ID  SKPP  Created     Updated     Size (MB)  Label
   7  No    2013-01-13  2013-01-13  81.95      test
   8  No    2013-01-14  -           0.01       test
( number 7 was the original 'test' server, number 8 was restored from backups of number 7. What I basically did was added enough records to bring it up to 7,000,000 from 700,000 (1 gig from 100megs in mysql), and try a backup.)
Outgoing bandwidth shows about 220mb outgoing at that time and then back to almost zero. That was 2 hours ago.

 To my questions; 

1) What happens when the ssh instance that initiated the tklbam-backup command is broken or terminated? does TKLBAM continue?

2) Is this something that happens? tklbam sort of just stops part way through? And when this happens, how does one recover their routine? Just initate another full backup and go from there?

3) Is there something about the size of the database (a 1 gig table/index) that caused this? Does TKLBAM have enough grit to handle tables this size?

4) (not related to this but I'd like to know), what happens when we run the tklbam-backup command while there is a backup job already running? does it recognize the current job and terminate? Or does it try to intiate and execute a whole new job while the other is running?


P.S. This is a test server only, not production, so nothing important is at risk...

Jeremy Davis's picture

But I'll chuck my 2c in the ring...

1) Any 'normal' Linux app will die when the SSH session it was launched from dies. Eg if you are copying a file within an SSH session and it dies, the copy stops. Generally AFAIK this is the case for most processes (other than daemons/services - obviously) because the process only exists within the SSH session, once that session ends, so do all the child processes. I have no reason to think that TKLBAM would be any different. Auto backups don't have that issue as cron is the parent process (which doesn't run within a shell).

This sort of SSH session behaviour can be avoided by using screen. Screen allows you to create a session that exists on the server rather than being dependant on the connection (as an SSH link is). To use it simply connect as you would normally (via SSH) and then use the command 'screen' to start a session. Then you can continue on as you would. To detach from the session (ie exit but leave the screen session and any processes running) use <Ctrl><a> - <d>. To list all the current screen sessions running use 'screen -ls'. To reattach to a specific session use 'screen -r <pid>' (eg 'screen -r 3392' - where 3392 is the pid displayed with the -ls switch). To exit (end) a screen session, type exit from within the session. If you lose connection while within a screen session, it will keep runining and you can reconnect and reattach the session as above (ie use -ls to list and -r to reconnect).

2) TBH I'm not sure, although I suspect it wouldn't be unhear of, especially with a massive DB like that. I suspect that the bandwidth got so saturated with uploading you backup that something had to give... I'm not sure what the best course of action would be but I would probably just try again (although I'd use screen this time...)

3) As mentioned above, I suspect that the size may have been a contributing factor to your failure, but I know of others that have backed up bigger backups without issue (although it will take a long time (depending on bandwidth). I would expect TKLBAM could handle 1GB of data fine, although depending on the resources available and DB tuning (or lack thereof) would be factors in how well MySQL itself would handle the DB dump (prior to backup). For a DB of that size to dump reliably and consistantly I would guess that probably 1.5GB would be the minimum RAM you'd want on your server (although probably worth a bit of research if you want to be sure). As for the data transfer it self, TKLBAM uses Duplicity as the backend so if you're interested in it's transfer limitations it's probably worth researching that yourself too.

4) TBH I'm not sure, but I assume that only one instance of TKLBAM would run at a time. Whether in practice this would mean that the second command in queued or whether it would simply fail I'm not sure (it's all purely speculation on my part - TBH I've never tried...)

Being a test server (wise plan!) test some stuff out! Give it a thrashing and let us know how it goes! :)

Rick Kipfer's picture

Thanks Jeremy, that 'screen' command sounds fabulous, I will difinitely use that.

Here's an update: I ran another full backup last night after my post, and this morning BOTH backups had completed successfully. So that means 2 things;

1) TKLBAM works fine with a 1 gig database on a 613M instance (albeit a bit slow)

2) If the hub reports a job still in progress, it may already be done, OR TKLBAM can somehow continue (with the upload?) after there is no longer a TKLBAM process running.

At any rate 7 million records is grossly past what we will be using in production, so I sure am happy with TKLBAM overall, and TKL rocks.


Add new comment