Dear All,

I'm having a problem restoring a redmine-12.1-squeeze-amd64 appliance from the hub to a fresh local VM install - works fine with 11.3 x86 appliances, but no luck with 12.1 thanks to the following error:

"Restoring databases

===================
 
SKIPPING MYSQL DATABASE RESTORE: mysql error (1): ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)"
 
myslq, which *had* been running on the local VM before the restore, has been stopped, and there is no sock file. Sounds similar to this post. mysql fails to start manually. myslq .log and .err are emtpy (there are a lot of innodb errors relating to ./ibdata1 in syslog, possibly unrelated). 
 
So tklbam-restore appears to stop mysql before it is able to restore any databases. If I do a tklbam-restore-rollback, then myslql automatically starts up again, sock file is created, and all is as it was before the restore.
 
Any advice would be very much appreciated.
 
Regards,
Aaron.
Forum: 

Some extra experiments:

  • tklbam-restore from hub to fresh local virtualbox vm *or* hub-to-hub.
  • 12.1 redmine and wordpress, amd64 and x86.
  • All fail to restore databases because mysql has been stopped on the destination, resulting in the "ERROR 2002 (HY000)", mentioned above.
  • All have the same passwords set at boot (or hub launch).

This behaviour does not occur with 11.3 appliances - is this a tklbam-restore bug in 12.1 appliances? tklbam is of limited use if it cannot restore the databases.

If it is not a tklbam bug, then does anyone have any idea why this is occuring?

Aaron.

Dmytro Pishchukhin's picture

Have the same issue with 12.1-core-x64 with mysql installed there. This is a showstopper bug.

Best regards,

Dmytro

Liraz Siri's picture

Let me try to reproduce this on my side. If I manage that a fix won't be long after.
Yosi Mor's picture

I too feel that this is a showstopper.  I am consistently seeing this with v12.1 appliances that include MySQL, such as LAMP and WordPress.

On a local guest machine, when applying a TKLBAM restore to a freshly created appliance (of the same exact type that was previously restored),

[1] the log shows:

> Restoring databases
> ===================
> SKIPPING MYSQL DATABASE RESTORE: mysql error (1): ERROR 2002 (HY000): Can't connect to local
> MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
>
> We're done. You may want to reboot now to restart all services.

[2] WordPress shows: "Error establishing a database connection"

[3] MySQL server is up, but not running properly.  Connecting to MySQL with phpMyAdmin shows
     "Connection for controluser as defined in your configuration failed."

I have verified that the above does not occur with v11.3 or v12.0 appliances.


On the other hand, when the hub is used to "restore a backup to a new cloud server", MySQL server is down and efforts to bring it up do not succeed.  As before, WordPress shows: "Error establishing a database connection"

Yosi Mor's picture

I took the liberty of opening a new issue on the TurnKey Dev Tracker on GitHub:

https://github.com/turnkeylinux/tracker/issues/40

Keith M's picture

Just wanted to note to anyone else having this issue that a workaround has been posted to the github page. The temporary fix (editing /usr/lib/tklbam/cmd_restore.py) mentioned worked fine for me, but I had to reboot my OpenVZ container (under Proxmox) for the database to restore itself as expected.

Yosi Mor's picture

Works flawlessly for me, even without a reboot (both on the hub and on my VMware guest)!

Jonathan Brewer's picture

I've tried twice this evening to restore a Wordpress backup to Amazon EC2 and it's failed with what appears to be MySQL related issues:

Error establishing a database connection

cmd_restore.py was up-to-date as per https://github.com/turnkeylinux/tracker/issues/40

I restored with the command:

tklbam-restore 1 --time 2013-07-17T07:04:38 --limits="-/var/run/mysqld"

Restore log here (for the moment) : https://dl.dropboxusercontent.com/u/22002973/tklbam-restore

Any thoughts? Is the bug back? Is July too far back for me to restore from?

 

Liraz Siri's picture

If this is the same issue then you don't have to apply that fix. We pushed out a new version of TKLBAM a couple of months ago that fixed the bug into the package archive:
apt-get update
apt-get install tklbam
We're also pushing out another a new version today that has more diagnostic features and should hopefully make it easier to find out what is going wrong.
Jonathan Brewer's picture

Hi Liraz,

I think your new fix introduced a different bug, and now it's even worse!

Here's my whole process now, done one more time, start to finish:

1.) Launch a New Server (Wordpress, Hostname nztelco, Region Sydney, EBS-backed, 64-bit, Micro, matching passwords, keypair associated)

2.) Connect and apt-get update, apt-get install tklbam

3.) tklbam-restore 1 --time 2013-09-01T07:13:31

4.) shutdown -r now

And now on restart, the server doesn't come back. Console of the reboot says:

Setting up resolvconf...done.
Setting up networking....
Configuring network interfaces...udhcpc (v1.17.1) started
Sending discover...
Sending discover...
Sending discover...
/usr/share/udhcpc/default.script: Lease failed: 
No lease, failing
Failed to bring up eth0.
udhcpc: SIOCGIFINDEX: No such device
Failed to bring up eth1.
done.
Cleaning up temporary files....
INIT: Entering runlevel: 2
Jeremy Davis's picture

And whether it is a TKLBAM issue or one with AWS... Looking at your log, it seems that there was an AWS DHCP issue, although in my experience that shouldn't stop your server from starting. However it may make it uncontactable via normal means.

My suggestion would be to stop it and start it again (assuming that it's a EBS backed server).

Jonathan Brewer's picture

Thanks Jeremy,

It was an EBS-backed server, and a reboot brought it right... however

Error establishing a database connection

still persists. 

Liraz Siri's picture

Sorry, I can't reproduce this. I just tested TKLBAM backup and restore of Redmine and there doesn't seem to be any problem with the new version of TKLBAM. No database connection error or any trouble rebooting.

Specifically, I installed Redmine (TurnKey 12.1) on a local VM, created a test project, and backed it up to our test Hub account. I then launched a Redmine instance on EC2 and restored the backup.

I suggest you take advantage of the new TKLBAM v1.3 features to try to isolate what is it about your backup that is causing the restore to fail:


tklbam-restore --raw-download=/tmp/mybackup

# take a peek
cd /tmp/mybackup
find
# edit/remove files you suspect may be causing the problem

# now let's try to restore the "edited" backup
tklbam-restore /tmp/mybackup

# doesn't work?
tklbam-restore-rollback

# rinse repeat
You can also exclude parts of the backup using the --skip-* and --limit flags. See the documentation for details.
Jonathan Brewer's picture

Hi Liraz,

Unless you are trying a restore of a Wordpress appliance, your test is not equivalent. The fault is with Wordpress connecting to MySQL. I have picked apart what's going on and I can see that the restored instance does not have the same users or passwords for MySQL that the original instance had.

Backup is of turnkey-wordpress-12.0-squeeze-x86

Restore to cloud as wordpress-12.1-squeeze-amd64

MySQL stuff is not carried through correctly, and as a result Wordpress doesn't work.

If the backup itself has failed, there's NOTHING I CAN DO. I am completely out of luck, and incredibly unhappy with all the money I've been forking over to Turnkey for this backup service.

I am hoping that it's a problem with the restore scripts.

-JB

Liraz Siri's picture

Hi JB,

I tested the backup/restore process with Wordpress earlier today (e.g., 12.1 => 12.1 and 12.1 => 13.0) and everything seems to be working perfectly.

Anyhow, your best bet would be to investigate what is actually in your backup with the latest version of TKLBAM. The backup structures are human readable. Try these commands:

apt-get update

apt-get install tklbam

tklbam-restore --raw-download=/tmp/yourbackup your-backup-id

You can use tklbam-restore-rollback to rollback the restore and try again as many times as you like.

PS: FWIW TKLBAM is a free service. Perhaps you meant you are signed up to the Hub's cloud deployment service? In any case you're not actually "forking" any money over to TurnKey for TKLBAM. The service is provided at cost which means you only get charged the regular S3 storage fees. In fact we actually lose a little bit of money providing it to you because Amazon charge us for per-user subscription fees.

sk283's picture

I am also seeing the same issue with wordpress + mysql: "Error establishing a database connection"

I was wondering if there are any ideas how to troubleshoot this error? for me it shows up periodically on the website..

Jeremy Davis's picture

If you are having an intermittent problem with "Error establishing a database connection" on your TKL WordPress site then it is unlikely to be this issue IMO (unless of course that it happens following backup/restore).

Your best bet is to start a new thread and provide details of what version of TKL you are running, where your appliance is running (e.g. AWS, including size and region) and the steps required to reproduce the issue.

An intermittent issue may just be caused by insufficient RAM for the load your server is getting...?

sk283's picture

Thanks a lot - i will go ahead and do that!

Add new comment