RChadwick's picture

I really hate to post this, and I'm not sure I'll get any constructive answers, but I'm getting to the end of my rope...

After installing Turnkey, and migrating all my websites to it, I got this problem where MySQL doesn't restart when it crashes: http://www.turnkeylinux.org/forum/support/20140812/error-establishing-da...

That was about a month ago. The problem occurred again a few days ago, and again this morning. I have no idea how to fix it, I'm not sure it would be fixed in the next version of Turnkey, and now I'm not sure when/if there will be a new version of Turnkey. Although I don't have a lot of time to migrate to another web server, I absolutely cannot have people going to my websites getting an 'Error connecting to database' error.  Frankly, the fact this serious, yet likely easy to fix issue is hanging around in what I thought was a mature product, has turned me off to Turnkey.

What would be a good substitute? Ubuntu Server with Zentyal installed? Something else? Should I move back to a Windows server? I've standardized on Wordpress for all my sites, so my needs are simply something that can run Wordpress reliably and securely (Security updates, tuned for production instead of development, etc), without spending a lot of time configuring or maintaining.

Thanks for any help you can offer!

Forum: 
Tags: 
Jeremy Davis's picture

The discussion you link to is definitely a better config and it would make life a bit easier. However it is essentially covering the cause of the problem (masking the symptom - not fixing it!). It will only fix the situation if your MySQL is crashing because it is running out of RAM. It's also worthy of note, that it's not just TurnKey that doesn't have the OOM killer configured like Tim was discussing. Neither Debian nor Ubuntu have it set up like that OOTB.

And really you don't want your server running out of RAM! It will cause you issues sooner or later (obviously ATM it is causing issues now). Have a look in your logs to be sure (/var/log/mysql and probably others) but I suggest bumping up the RAM or at least making sure you have plenty of swap.

Obviously it's really hard to diagnose when I know very little about your specific config; but assuming that what we both think is happening is happening; it's nothing to do with the OS itself. You will have the same issues with vanilla Debian (which TurnKey is built on) or Ubuntu (which is also built from Debian) - assuming that you have the same software installed, same RAM allocated and same traffic...

To me the only logical way forward is to troubleshoot what the problem actually is. Then you will be in a position to make an informed and educated decision about what to do next...

Robert Chadwick's picture

Thanks for your response Jeremy.

My initial introduction to this issue was because I made a mistake and gave the VM a ridiculous little amount of RAM. It wasn't on purpose, and I fixed it immediately. Currently, ESXi says Turnkey is using on average around 500MB, and it has 1.5GB.  Although I'm not watching 24/7, I've never seen it get close to 1.5GB usage, and if it did, wouldn't it use swap? Usage on my websites should be very low, unless if someone is trying to hack or DOS me.

In a perfect world, MySQL would just never crash under any circumstances. In the real world however, I need it running. In the schemeof things, I'd rather have it restart, and not address the issue why, rather than use tons of resources to track down the reason, fix it, and then have something else, or even just a random one-time weirdness bring the server down again. Uptime is king.

OnePressTech's picture

Hi Robert,

As Jeremy indicated your problem is unrelated to TKLX. You can experience it with any Linux. Unless you want to pay someone else to deal with the issue (WPEngine for example will host your wordpress websites and take care of your database for you) you need to fix the problem at source.

The likely culprit for the problem is your apache web server. All it takes is an intense robot crawl and Apache will suck up your resources triggering the OOM killer...there goes your mySQL. You will need to configure Apache so it does not exhaust your RAM and the problem will likely go away. Your alternative is to adjust the OOM Killer prioty for your MySQL instance.,,,the OOM Killer will just kill something else though.

Apache settings adjustment fixed my MySQL / OOM Killer issues.

Sometime over the next few months I'll work out the formula to auto-configure Apache properly and I'll share it with the TKLX community.

Until then...beware the OOM Killer :-)

 

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

You know much more about this than I do so it's great to have your informed input. I look forward to your Apache auto-config to balance it's RAM usage! :)

Robert Chadwick's picture

You could be right about the RAM. I've seen peaks as high as 1GB, and when Wordpress runs backups at night, it might push it over. If so, what to do? How much RAM does TKL need? How to configure Apache to not use soo much? How to configure MySQL to restart after it has shut down?

OnePressTech's picture

I run my Wordpress VMs with 1GB RAM and the following Apache settings in file /etc/apache2/apache2.conf.

#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300

#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On

#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100

#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 15

# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxClients: maximum number of server processes allowed to start
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule mpm_prefork_module>
    StartServers          2
    MinSpareServers       6
    MaxSpareServers      12
    MaxClients          25
    MaxRequestsPerChild   3000
</IfModule>

The formula for Apache configuration is a bit messy and takes some time and thinking. What you are really doing is constraining your Apache to a fixed number of processes assuming processes of a certain average size. If you get a high access rate, this gets sucked up fast and then requesters are slowed and some ultimately get a timeout. For your normal access loads this configuration should be fine.

The formula is complicated by the fact that there is no standard process size to use in your calculation. The size of the process is dependent on the size of the Application launched. In the case of WordPress this is dependent on the number and type of plug-ins you include. You have to look in TOP at your process sizes and then remember to re-visit this when you add more plug-ins to your Wordpress.

The Timeout / KeepAliveTimeout (listed above) is a bit of a wildcard. Dropping it will increase the speed that Apache connection memory is freed so you would be able to handle a higher access rate with the allocated memory. The downside is that there is a race-condition at the protocol level on late packets that could see an increased error rate as late packets from a previous connection are routed to the new connection that has re-used the old connection's freed resources. There is no definitive number for Timout / KeepAliveTimeout reduction. Making it smaller increases the probability but not the certainty of increased connection error rate (http://publib.boulder.ibm.com/httpserv/manual60/misc/fin_wait_2.html).

This is why I have determined that my next step will be to automate the configuration. It needs to be self-adjusting based on a process-size / connection error-rate feedback loop.

Hope this helps. Sorry there is no simple answer until I automate the process.

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

In my experience flakey plugins/extensions/modules and poor PHP code in general are common culprits of Apache memory leaks.

If you are using a CMS or some sort of framework then the first thing to do would be to disable/remove any plugins/extensions/modules that you aren't using.

As a dirty workaround, you could set up a cron script to regularly check to see if MySQL is running and if it's not then start it... Perhaps restart Apache at the same time!?

OnePressTech's picture

It is my experience with WordPress that Plug-in leakage is not an issue especially if you stick to the mature high download plug-ins. The issue is just the default Apache configuration is a fixed default that has no relationship to the constraints of the system it is installed on.

If you just put in a MySQL cron job to restart your MySQL when it is not running you either have to choose to set the poll time to frequent or infrequent. The former chews up CPU and tries to restart MySQL in the middle of an OOM Killer RAM reduction cycle (causing a restart thrash as they battle for supremacy) and the latter causes your website to present the "can't connect to database" error message to your customers until the cron job restores the MySQL after N seconds / minutes. Neither option is acceptable.

There is no real choice but to configure your apache settings correctly and the process is extremely geeky.

Without an auto-configuration capability, Turnkey linux VMs won't be robustly useful to the non-technical.

 

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

I was more talking about the trap I fell into early on when I first started playing with CMSs: "Ooooh, that looks cool I'll install that! Ooooh that looks cool too!" Like a magpie collecting all the shinies! :) Not necessarily saying that that is what Robert has done, but just making sure that that is not a part of the issue...

Perhaps I have been lucky but I have not had any issue with any of the sites and web apps that I look after...

I did consider that there might have been some thrashing as a result of OOM killer; hence why I also suggested a restart of Apache too. TBH I'm not sure if that's enough (I don't know enough about OOM killer) but I figured that it is probably a start and worth a try (as a hacky short term potential workaround).

OnePressTech's picture

The thing about the OOM Killer is that it has one job...recover memory when RAM gets critically low through the killing of large processes to protect the operating system's operational integrity.

If you are under an extended robot crawl restarting MySQL and an apache webserver that is not constrained to a sustainable amount of RAM will trigger a loop. The OOM killer sees low RAM and kills the biggest task subject to OOM killer priorities. Unless you override the default OOM Killer priority the MySQL is picked on only because it is usually the largest RAM consuming process that is running. If another larger process is running it will be killed first. If you restart apache at the same time it does not alter the fact that you are in the midst of an extended robot crawl. Once MySQL / Apache are back up the inbound connection requests will saturate the apache server again triggering a low Ram situation and the OOM Killer will kill your MySQL again. The restart would need to wait until the crawl was over. But how long do you wait? You either take a guess at a cron job timeout or write a script to monitor the request queue at the TCP/IP level. Unfortunately HTTP/TCP/IP has an auto-backoff mechanism so the crawler may just be waiting for your VM to start consuming connection requests again...back into the OOM Killer loop.

The CRON job restart is an imperfect solution to process failures. It is a good short term bandaid but not a fix. You are either chewing up CPU by polling frequently to ensure process downtime is minimal (every minute) or the cron job polls more efficiently and you have a longer process downtime until restart (every hour).

Configure the Apache correctly and the need for the CRON job disappears. No real other option.

Just one man's 2 cents worth :-)

 

Cheers,

Tim (Managing Director - OnePressTech)

Robert Chadwick's picture

I can understand, and appreciate the need to do things right. I also suffer from the 'perfection gene', and often get sucked in to making projects extremely optomized, when 'just good enough' would have been fine. I also hate to do something half-assed, and would rather not do it all than do it to lower standards. I'm not discouraging trying to optomize TKL, we just need to do something until it happens.  Using a blunt tool, I made my config look like yours, and it seemed fine overnight (I had another 'dead MySQL' event the night before), and today I decided to sacrifice an unused VM so I could give 2.5GB RAM to TKL. Still, it would be nice to have a cron check once in a while, which is beyond my ability to figure out on my own (Although not perfect, a website down for 15 minutes in the middle of the night is better than down for 6 hours until I notice it in the morning). Could you share how to make a cron, and what would be inside? Better yet, is there a way to make the cron check for a non-running MySQL AND a low memory condition?

Jeremy Davis's picture

I think that when you don't really know what you are doing and you are testing things out to see if they help that documenting exactly what you have done is important (so it can be easily undone if need be... FWIW by default TurnKey includes etckeeper which uses git to keep the /etc dir under revision control. That's a great thing, but it requires that you understand the usage of git.

If you don't and don't have a desire too, then another way is to just use the 'cp' (copy) command. Before you adjust a config file run something like this:

cp something.conf something.conf.$(date '+%Y%m%d-%H%M')

That will give you a filename with the date (YYYYMMDD) and time (24hr) appended on the end (so the files will display in order of age when you run ls).

But I digress... Back to the question at hand...

In situations such as this google is your friend... If you keep in mind that stuff like this (i.e. dirty bash scripts) is usually (although not always) quite generic in Linux (meaning that probably anything that you find will probably run if not work how you intend; anything that doesn't will probably be due to a tool that is not installed). For specifics, TurnKey is built on Debian (as is Ubuntu) so bash scripts/instructions for either of those should probably work OOTB.

So googling 'cron job to check if service is running' returned a link (first result) that gives some good pointers and a skeleton script. Personally I would make it log actions to a file (just redirect output to a file using '>> file.log 2>&1') and also restart Apache (first).

If you want to also check RAM usage then googling 'bash script to check ram' has a link (third one down) which gives a script that checks memory and emails when it's low. It uses an MTA called mutt but TKL has Postfix preinstalled so you just need to adjust the instructions if you want it to send emails (google if you need more detail). Hint:

echo "My message" | mail -s subject user@example.com

With a bit of tweaking and testing within an hour (give or take) you would have a script that does exactly what you want. If those links don't provide enough detail, then Google will give you more details about setting up a cron job, or it's pretty straight forward to use Webmin for that if you'd rather...

Good luck! :)

OnePressTech's picture

The config settings I provided will only support with 1G Ram the Wordpress configuration I use as a standard. I have a number of reasonably sized plug-ins such as Event Manager, Formidable Forms, and S2Member but these Apache settings may not sustain a WordPress with bigger plug-ins such as BuddyPress or WooCommerce without OOM killer killing mySQL under intense robot crawl conditions. A quick load test would determine that. A quick check of TOP would indicate how to adjust the Apache settings.

Hopefully this settles your server for now until I, or someone else, provides an automated configuration.


 

Cheers,

Tim (Managing Director - OnePressTech)

Jeremy Davis's picture

Also I just remembered that at one point you said something about the issue seeming to coincide with backups!? TBH I don't know about the RAM usage of TKLBAM, but I know that it can be quite CPU intensive. Apparently limiting bandwidth also limits CPU (to a certain extent) although I'm not sure whether that would also have an impact on RAM usage (I suspect not, but I haven't tested/monitored). No doubt that someone has written a commandline tool that can limit a processes RAM usage which might be worth a try (limit RAM of TKLBAM if you think that is a contributing factor... Although personally I'd consider doing some monitoring with a cron script to see what is chewing up your RAM before you implement any hacky cron scripts...

OnePressTech's picture

Here's the thing...if the backup process chews up a lot of RAM, it would likely be bigger than MySQL so the OOM Killer would kill it and not the MySQL (OOM Killer protects the O/S not the applications).

The reason why Apache is so much trouble is that it pre-forks a lot of processes and each process is typically smaller thean the MySQL process. So, although Apache chews up a lot of RAM, none of the running Apache processes are larger than the MySQL process.

That's why I would not suspect the Backup. I have not looked at TKLBAM from a running perspective but I would assume it is one process potentially threaded.

Also, I have not experienced a backup-related MySQL death on any of the WorpdPress sites I manage within 512K and 1GB VMs.

Having said that, if the Wordpress site was supporting BuddyPress and WooCommerce and a lot of users, posts and comments and the MySQL process was bigger than the TKLBAM backup process and the cumulative RAM use during backup triggered the OOM Killer then MySQL would be killed so it is worth considering as a culprit. That's a lot of "AND"s though :-)

Should be easy to reproduce...just go into Webmin TKLBAM interface and trigger a full backup. If MySQL stays up then that is not the problem.

Cheers,

Tim (Managing Director - OnePressTech)

RChadwick's picture

I want to say I genuinely appreciate all the responses, but I think my brain is about to shut down. Honestly, I just want to run a web server. I expected to have to do some work, but when the work maintaining the web server outweighs the time spend maintaining the website, I just have to stop. I'm getting sucked down a rabbit hole that is going to suck up all my time, and not put food on my family's table. Being self-employed, I literally don't have a minute to spare.

I'd like to say that I have been googling like mad. Perhaps I found the answer many times over, but didn't know enough to know I had the answer. I'm also not familiar with different flavors of Linux, and don't know how well directions for one flavor/version will apply to another. Asking for help is almost always my last resort. I'm glad it was fast for you to find the answer, but the same thing doesn't come as easy to me.

Creating a script on my own? It would take me a while just to figure out how to run it. I don't even have any idea how to make cron run a script, or when. Syntax? A 'Hello World' script would take me hours. Doing something useful might take me a few days or more. I can follow directions, but wishful thinking won't make me competent in scripting.

A FAQ that covers these issues, and solutions, might be a great idea. I doubt I'm doing anything unusual with TKL, and I'm sure others have had the same questions and problems. It would be a great help, especially for those of us who are not Linux experts.

The backup was a Wordpress-only thing. Never figured out TKLBAM. I wanted to, but it's too confusing (I am NOT a big believer in the cloud).

So, if I buy more server RAM, would that fix things? How much RAM?

Thanks again!

 

 

 

 

Jeremy Davis's picture

If you don't have a desire to learn much about Linux then I suspect that a self-managed server is not for you. TurnKey lowers the entry bar, but it doesn't eliminate the need to learn a bit about it (as you are seeing).

I hate to turn people away from TurnKey specifically or Linux in general, but in your case perhaps the ~$100-$150 per year cost of a half decent shared hosting plan would be a better investment?

OnePressTech's picture

Hi Robert,

We can see you are a bit frustrated but TKLX is targeted to the DIY service manager with technical expertise not the consumer or busy business owner who just wants tech that works (my opinion...not necessarily the opinion of the TKLX founders :-).

If you are just running an in-house server then TKLX will do the job reasonably out of the box because it is a controlled environment. As soon as you make your server accessible by the general public via the Internet the list of technical issues you need to know, understand, and address for your service to scale and operate reliably in a secure and private manner is lengthy and non-trivial. If you think this particular RAM issue is tricky you are in for a shock...it's just the tip of the iceberg.

My business is managing Office Clouds for small businesses based it on TKLX and WordPress. Sometime later in 2015 I will be releasing an open source Customer Cloud appliance based on TKLX that is likely what you had thought the TKLX WordPress appliance would be.

Until then you might want to have a look at someone like WPEngine (I haven't used them so this is not a recommendation, just an observation).

I hope that is helpful. Sorry if it isn't what you had originally thought.

Now...after saying all that, if you still want to make a go of it Jeremy and the TKLX community are pretty helpful.

So...how much RAM do you need?

Honestly...if you are running a WordPress site with a small footprint (usage, users, plugins, etc) you can run it in 512K. If you have at least one reasonably beefier WordPress plugin (Event Manager, BuddyPress, WooCommerce) you should start with 1GB and see how it goes. I run my managed WordPress sites with 1GB RAM.

You will not be able to run TKLX WordPress with reasonable performance on Apache without a WordPress caching plug-in. Configuring a caching plug-in is non trivial. If you are going to pick one use SuperCache not W3TC. The latter is more powerful but tougher to configure. The former is simpler to configure and does a good job.

Again...if you use a managed WordPress hoster like WPEngine, they cache at a higher level in the architecture so you would not need a WordPress caching plug-in.

Forget the Cron job. See if the new Apache config sorts out your DB crash issue. Let us know how it goes and we'll help when we can.

Cheers,

Tim (Managing Director - OnePressTech)

RChadwick's picture

I suppose it's about control. I want to have as much as possible in-house. I've run a webserver successfully for about 15 years, 14.9 of those years on Windows.  I suppose going back is an option, but benchmarks showed LAMP was a lot faster than WAMP for me. Still, I'd rather have it slow, than down.

OnePressTech's picture

If it is control you are looking for then Linux gives you a lot more options than Windows. But, with greater flexibility and control comes more complexity. That's always the way :-)

Cheers,

Tim (Managing Director - OnePressTech)

OnePressTech's picture

Ah...reading between the lines it appears you have not enabled TKLBAM and are using a WordPress backup. Is that correct? If so, which backup plug-in are you using?

I see now why Jeremy previously highlighted the backup as a potential issue. I missed that.

Tklbam won't cause your RAM / OOM-Killer issues but a WordPress backup plug-in could suck up a big chunk of RAM and trigger the OOM killer (though I would expect it to kill the backup not MySQL so I still think your Apache might be the culprit...easy to test...do a full backup and see if your MySQL crashes).

WordPress plug-in issues are not TKLX issues.

If you have a MySql crash related to a backup WordPress plugin then how much RAM you need is only something you would be able to determine yourself through analysis, Google browsing or trial & error.

But it isn't a TKLX issue. As Jeremy previously pointed out, you would have this issue on any server.

Cheers,

Tim (Managing Director - OnePressTech)

Add new comment