Joe Marcellais's picture

My TurnKeyLinux installation [Debian 7.2 (wheezy)] WordPress installation keeps losing connection to the database.

If I reboot the server it will come back online for a few hours but then eventually loses the connection to the database again. 

Error displayed on the url is "Error establishing a database connection"

I've already verified that all available updates for WordPress and plugins are installed

Running WordPress 3.9.2

 

Forum: 
Joe Marcellais's picture

Aug 12 13:43:28 wordpress mysqld: 140812 13:43:28  InnoDB: Database was not shut down normally!

Aug 12 13:43:28 wordpress mysqld: InnoDB: Starting crash recovery.

Aug 12 13:43:28 wordpress mysqld: InnoDB: Reading tablespace information from the .ibd files...

Aug 12 13:43:28 wordpress mysqld: InnoDB: Restoring possible half-written data pages from the doublewrite

Aug 12 13:43:28 wordpress mysqld: InnoDB: buffer...

Aug 12 13:43:29 wordpress mysqld: 140812 13:43:29  InnoDB: Waiting for the background threads to start

Aug 12 13:43:29 wordpress ntpd[2741]: ntpd 4.2.6p5@1.2349-o Sat May 12 09:54:55 UTC 2012 (1)

Aug 12 13:43:29 wordpress ntpd[2742]: proto: precision = 0.100 usec

Aug 12 13:43:29 wordpress ntpd[2742]: unable to bind to wildcard address 0.0.0.0 - another process may be running - EXITING

Aug 12 13:43:30 wordpress mysqld: 140812 13:43:30 InnoDB: 5.5.38 started; log sequence number 2290668

Aug 12 13:43:30 wordpress mysqld: 140812 13:43:30 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306

Aug 12 13:43:30 wordpress mysqld: 140812 13:43:30 [Note]   - '127.0.0.1' resolves to '127.0.0.1';

Aug 12 13:43:30 wordpress mysqld: 140812 13:43:30 [Note] Server socket created on IP: '127.0.0.1'.

Aug 12 13:43:30 wordpress mysqld: 140812 13:43:30 [Note] Event Scheduler: Loaded 0 events

Aug 12 13:43:30 wordpress mysqld: 140812 13:43:30 [Note] /usr/sbin/mysqld: ready for connections.

Aug 12 13:43:30 wordpress mysqld: Version: '5.5.38-0+wheezy1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Debian)

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2838]: Upgrading MySQL tables if necessary.

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2842]: /usr/bin/mysql_upgrade: the '--basedir' option is always ignored

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2842]: Looking for 'mysql' as: /usr/bin/mysql

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2842]: Looking for 'mysqlcheck' as: /usr/bin/mysqlcheck

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2842]: This installation of MySQL is already upgraded to 5.5.38, use --force if you still need to run mysql_upgrade

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2938]: Checking for insecure root accounts.

Aug 12 13:43:30 wordpress /etc/mysql/debian-start[2945]: Triggering myisam-recover for all MyISAM tables

Aug 12 13:43:31 wordpress postfix/master[2959]: daemon started -- version 2.9.6, configuration /etc/postfix

Aug 12 13:43:39 wordpress ntpdate[1780]: step time server 199.102.46.74 offset 3.776293 sec

Aug 12 13:43:40 wordpress kernel: [   20.772103] eth0: no IPv6 routers present

Aug 12 13:43:41 wordpress ntpdate[1829]: no server suitable for synchronization found

Aug 12 14:09:01 wordpress /USR/SBIN/CRON[3333]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete)

Aug 12 14:17:01 wordpress /USR/SBIN/CRON[3399]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

Aug 12 14:39:01 wordpress /USR/SBIN/CRON[3696]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete)

 

Liraz Siri's picture

Can't see what it might be from the logs you provided. I suggest you try and backup and restore on a fresh TurnKey wordpress installation. Verify the hash/signature of the image you download to make sure the download hasn't been corrupted.

Also, sometimes these sorts of random issues happen due to bad RAM. If you're the only one running into a particular problem with a stable component like mysql then that might be something worth considering. If you restore your backup on another computer or on a cloud instance and the behavior doesn't repeat itself there then it's probably a hardware problem.

Jeremy Davis's picture

Bottom line is that this is not normal...! From a glance at your logs, it seems that MySQL is crashing... Possibly one of the first things to check is the MySQL logs to see if there are any hints there.

Joe Marcellais's picture

The installation is running on a VMware server in a virtual environment. No other servers on the host are having any issues, so I don't think it's hardware related.

All of the MYSQL logs are blank as well.

I've zipped up the var/log folder and uploaded it to my OneDrive is that helps. I've gone through most of em and can't seem to find anything helpful.

 

Var/Log files
http://1drv.ms/1nPb0bV


Any help is greatly appreciated.

Thanks.

 

Joe Marcellais's picture

The server essentially was down. Even after rebooting, I would be able to hit the WP-login.php and see the login prompt, but by the time I entered the user name and password the server is frozen again and then giving Error Connecting to the Database error

 

Tried to access the server via the VMware console directly, the limited TKL interface was completely frozen. Ended up powering off the virtual completely for 5 minutes and rebooting. Site appears to be back up for now but would really appreciate some help interpreting these log files. I've uploaded a new set of files. 

Any assistance is greatly appreciated. 

 

New Log Files
http://1drv.ms/1psrFXV

Jeremy Davis's picture

And from a quick glance I would say that your TKL install is broken! Your Apache logs are full of segfaults and issues allocating memory, there is also mention of MaxClients setting being reached (which unless your server is getting LOTS of traffic would be strange). From your Apache logs (/var/log/apache2/error.log):

[Sun Aug 10 09:47:48 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Sun Aug 10 09:48:26 2014] [error] server reached MaxClients setting, consider raising the MaxClients setting
[Sun Aug 10 09:48:29 2014] [error] [client 80.82.64.70] request failed: error reading the headers
[Sun Aug 10 09:52:41 2014] [notice] child pid 23661 exit signal Segmentation fault (11)
[Sun Aug 10 09:52:43 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Sun Aug 10 09:52:55 2014] [notice] child pid 23611 exit signal Segmentation fault (11)
[Sun Aug 10 09:52:56 2014] [notice] child pid 23628 exit signal Segmentation fault (11)
[Sun Aug 10 09:53:00 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Sun Aug 10 09:54:12 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Sun Aug 10 09:55:10 2014] [crit] mod_rewrite: could not init map cache in child
[Sun Aug 10 20:39:00 2014] [warn] RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?)
[Sun Aug 10 20:39:00 2014] [warn] RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?)
[Sun Aug 10 20:39:00 2014] [warn] RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?)
[Sun Aug 10 20:39:00 2014] [warn] RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?)
[Sun Aug 10 20:39:00 2014] [notice] Apache/2.2.22 (Debian) PHP/5.4.4-14+deb7u12 mod_ssl/2.2.22 OpenSSL/1.0.1e configured -- resuming normal operations
[Mon Aug 11 00:01:37 2014] [error] server reached MaxClients setting, consider raising the MaxClients setting
[Mon Aug 11 00:04:08 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Mon Aug 11 00:04:40 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Mon Aug 11 00:04:59 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Mon Aug 11 00:05:22 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Mon Aug 11 00:05:49 2014] [error] (12)Cannot allocate memory: fork: Unable to fork new process
[Mon Aug 11 00:11:44 2014] [notice] child pid 5564 exit signal Segmentation fault (11)

It also looks like MySQL is having issues allocating memory, as well as crashing and corrupt tables (seems to be one specific table that keeps getting corrupted). From your Daemon log (/var/log/daemon.log):

Aug 13 17:31:43 wordpress mysqld: InnoDB: mmap(137363456 bytes) failed; errno 12
Aug 13 17:31:43 wordpress mysqld: 140813 17:31:43 InnoDB: Completed initialization of buffer pool
Aug 13 17:31:43 wordpress mysqld: 140813 17:31:43 InnoDB: Fatal error: cannot allocate memory for the buffer pool
Aug 13 17:31:43 wordpress mysqld: 140813 17:31:43 [ERROR] Plugin 'InnoDB' init function returned error.
Aug 13 17:31:43 wordpress mysqld: 140813 17:31:43 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
Aug 13 17:31:43 wordpress mysqld: 140813 17:31:43 [ERROR] Unknown/unsupported storage engine: InnoDB
Aug 13 17:31:43 wordpress mysqld: 140813 17:31:43 [ERROR] Aborting

...

Aug 13 17:52:44 wordpress mysqld: 140813 17:52:44 [ERROR] /usr/sbin/mysqld: Table './nvcincco_wrd1/wp_options' is marked as crashed and should be repaired
Aug 13 17:52:44 wordpress mysqld: 140813 17:52:44 [ERROR] /usr/sbin/mysqld: Table './nvcincco_wrd1/wp_options' is marked as crashed and should be repaired

My suspicion that your appliance filesystem is corrupted, but why that is I'm not sure... Considering that these errors seem to have been occurring since as far back as the logs go (assuming that is as old as the appliance is) my first guess is that the image got corrupted on download. My next guess would be hardware (on you VM host).

I suggest you follow Liraz's advice and use TKLBAM to restore to a freshly downloaded and verified image.

And even though you aren't experiencing issues with other VMs on your host, don't rule out hardware such as RAM. I have seen multiple VMs running error free on a server with a confirmed bad stick of RAM.

Liraz Siri's picture

It looks an awful lot like its running out of memory. Do you have a swap partition and if so how large? Maybe consider adding more RAM to the instance or decreasing the number of children that can be spawned from Apache.

Jeremy Davis's picture

However I was under the impression that running out of memory alone shouldn't cause an Apache seg fault (the other errors perhaps). As I understand it Apache seg faults are memory related, but more to do with memory that it cannot or should not be accessed, being accessed. And that Apache seg faults are usually caused by PHP apps and or modules!? Hence my suggestion on corrupt FS.

I guess in retrospect that was possibly a big conclusion to jump to on such little info... I guess it is just as likely that a (user installed) WP module has a memory leak that is causing the system to run out of memory and causing those other issues...

Although the fact that Apache is hitting the MaxClients limit would suggest that the server is being heavily utilised... I guess in that instance you are very likely right about the memory (especially if there is also a WP module with a memory leak).

I guess there is also the possibility of malware. Another user recently reported that his TKL server (on AWS) was compromised very soon after he booted it up (from what my research suggests was a brute force SSH attack).

Liraz Siri's picture

Accessing a memory region that doesn't exist or that you don't have access to is what triggers SEGFAULTS. It's a property of memory protection.

In an ideal world running out of memory would generate a proper error messages so you know what is going wrong. The thing is that running out of memory is rarely something that is well tested. So when a bit of C code asks for memory allocation and doesn't get it sometimes the result is unpredictable.

The only thing you need for a segfault to happen in that case is for the code to try and access the address for the failed memory allocation without checking first whether or not the allocation failed first.

Liraz Siri's picture

I've downloaded the log files and will take a peak to see if anything jumps out at me. That said, since you seem to be the only one reporting this problem it's most likely not an issue with the TurnKey integration and I'm not sure there is much we can do to help other then try to suggest various strategies for isolating what is causing this. Bughunting is tricky.

OnePressTech's picture

It's been on my "to do" list to talk with Jeremy and Liraz about looking to configure the core to auto-calculate / set the apache settings based on available memory AND to set the process priorities to sensibly deal will the OOm Killer AND to configure the core so the database restarts itself if killed (at least one retry).

http://backdrift.org/oom-killer-how-to-create-oom-exclusions-in-linux

Bottom line is, the TKLX Wordpress / appliance is not configured to be foolproof and it could be. At present when there is a heavy run on the apache server it sucks up all the memory and the OOM killer kills processes to free memory up. The process that usually gets killed is...you guessed it...the database.

Now while the apache server is configured to restart itself if it is killed, for some reason the database is not configured to restart itself.

Suggestion for 13.1 release:

1) configure apache settings automatically via calculation based on average apache child size and capped  percentage of physical memory taking into consideration the size of the database (as database gets bigger the memory available for apache for concurrent requests is reduced).

2) set the database to restart itself when it dies (at least 1 retry)

3) set the OOM priority high on the database so it is the last thing to be killed under a low memory situation (i.e. could first reduce apache memory allocation, could set OOM priority settings for webmin, phpmyadmin, and webshell to be lower than the database so they are killed before the database...then a cron job could restart them as memory settles back to a steady state).

I will be getting around to this over the next few months if someone else doesn't beat me to it.

 

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Perhaps it might even be worth transferring them to the GitHub issue tracker (even if you plan to work on them yourself).

OnePressTech's picture

Ok good idea...I'll open a feature enhancement in the issue tracker. It will be more useful than the blog for tracking progress and fostering collaborative solution.

Cheers,

Tim (Managing Director - OnePressTech)

Add new comment