Andrew Anguiano's picture

(I originally posted this on ServerFault but thought I'd post it here as well. Here's the original: http://serverfault.com/questions/747662/tklbam-backup-stalling-causing-mysql-issues)

It appears that tklbam-restore causing my MySQL server to become inaccessible. I manually ran a backup and noticed that as soon as it got to the DB phase of the process, my Wordpress servers could no longer access MySQL. The backup process seems to be stuck at one of my DB tables. Here's the last few lines:

table: trendsandteens/wp_wfNet404s
table: trendsandteens/wp_wfReverseCache
table: trendsandteens/wp_wfScanners
table: trendsandteens/wp_wfStatus
table: trendsandteens/wp_wfThrottleLog
table: trendsandteens/wp_wfVulnScanners

It's just backing up Wordfence's tables. So I'm not really sure what the issue is... Any ideas? Here is the traceback after I interrupted the process: http://pastebin.com/QV63cBPG

There's more info in the original ServerFault post, but I didn't think it was relevant for this forum.

 

 

 

 

Forum: 
Jeremy Davis's picture

The first thing that I was going to suggest is that you make sure you are not running out of resources. Both lack of free disk space and RAM will cause MySQL to crash.

If you are using Amazon, are you using a micro server? If so I'd almost guarantee that's the issue. Micro servers don't have much grunt and the little extra load that TKLBAM imposes can push them over the edge.

But after reading through your ServerFault post it seems that you have already checked that. Is that the case? Is MySQL still running? Or has it died?

Looking at the TKLBAM trace it appears that it is crashing when opening a file. One of the causes of that can be lack of free RAM.

One other thing that comes to mind; is that if a backup is interrupted, TKLBAM will try to continue from where it was up to. If there is something corrupted then perhaps that is causing the issue. And each time it tries again the same corrupt file causes it to crash... TBH I'm only guessing but seems possible.

Andrew Anguiano's picture

Thanks for getting back to me, Jeremy! I apologize that it's taken me so long to respond. I don't believe the issue lies with memory usage. I have 2GB and I haven't seen any spikes in usage. I'm running on a private virtual data center.


I'll take a look at that DB. Perhaps it's corrupted or something. I'll try having it skip that table and we'll see if that solves the issue.

Thanks!

Jeremy Davis's picture

Is MySQL actually crashing? Or does it recover at some point later? Or does it report that it's still running but needs a restart?
Andrew Anguiano's picture

I was able to backup by skipping databases, but still have the same issue if I include databases. I tried skipping over the one that was giving trouble but it still stalled in when it was serializing a different database.


Any ideas? If not, I may just have mysqldump dump the DB to a folder and then have TKLBAM backup the folder and skip DB's.

Jeremy Davis's picture

But IIRC TKLBAM just does a mysqldump of each database anyway...

Although I'd be interested to hear how you go.

Perhaps there is an edge case TKLBAM bug? You are now one of a couple or 3 people that have reported this (or a very similar) issue. We haven't been able to get to the bottom of what might be the cause though...

Andrew Anguiano's picture

I just tried a mysqldump --all and it worked fine. From the logs, it looks like at some point, TKLBAM sets the mysql listen port to port 0? I'm not sure what that means though.

 

For now, I may have to move away from Turnkey since I can't have Mysql crashing throughout the day.

Jeremy Davis's picture

Can you please tell me where this is running (e.g. local vm, Amazon, somewhere else) and how much resources (i.e. CPU/RAM/etc) it has?

Also do you think that you might be able to send us a copy of your sanitised database (i.e. remove the user emails, passwords and any other sensitive info from a DB dump)?

If we can gather enough info hopefully we'll be able to reproduce this. If we can reproduce it then we should be able to work out how to fix it! I've also opened a new issue on the tracker: https://github.com/turnkeylinux/tracker/issues/560

Add new comment