Anthony Biasi's picture

I'm having strange issues with the Wordpress virtual appliance. I'm using the hosted Turnkey Hub portal to host a webserver on the Wordpress VA. Every few weeks, the entire linux instance comes to a complete laggy halt. Apache2 seems to be the culprit, because as soon as I stop the service -- the instance returns to functioning normally again.

Oddly enough, restoring from a recent backup to a new instance clears up any issues and all returns to normal once I updated my external DNS zone to point at the new instance created from backup.

I'm not very versed in UNIX, but the error logs for mysql and apache don't seem to show anything useful.

Any suggestions about where to start?

 

 

Forum: 
Jeremy Davis's picture

I'm guessing that this is a micro server, is that right? If so, I bet it's because micro instances have CPU throttling. My guess is that you are hitting the resource limit and CPU throttling is kicking in. IMO micro instances do not provide enough resources for a production server (really only useful for testing and very low traffic personal websites). Unless you tune your instance to use less resources, you'll need to use a larger size instance.

If you want to check that this is indeed the cause (or you already have a medium instance) then I recommend that you keep an eye on resource usage over the next few weeks. And when your server next does this; check the CPU and RAM usage to see what is going on...

Jeremy Davis's picture

TBH I haven't heard of a Small server getting throttled, although the behaviour you are reporting sounds very much like what happens when it does. All t2 type instances (nano, micro, smaill, medium and large) can be throttled by Amazon. They have what AWS refer to as a "CPU bank" which gets used when your server uses lots of CPU. If your "CPU bank" runs out of credits within an hour then your server will be throttled. Have a read up about that and how you can double check that that's not the problem (from the AWS console): http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html#t2-...

You mention that stopping Apache makes your server return to "normal". As Apache is a big CPU drain (especially if it has lots of forked processes) that suggests the possibility of CPU throttling. If you just restart Apache does that help at all? And/or have you tried rebooting? What (if any) effect does that have?

You mention that "When this does occur, apache2 spawns many instances and the CPU % starts to max out". So are you sure that Apache2 creating lots of forked processes and maxing out the CPU is not the cause of the sluggishness? TBH I suspect that Apache2 (for some reason - more on that next) is running away, loading up the vCPU(s). That is in turn causing the CPU to be throttled, hence the sluggishness. That would possibly explain why the same happens even on larger T2 instances (any instance size with a maxed out CPU will get throttled eventually).

Obviously I'm only guessing but I still suspect that throttling is the immediate cause of the sluggishness. But I have little doubt that Apache is the cause of the CPU getting maxed out. That's where things get tricky as you'll need to discover the cause of Apache's bad behaviour.

It could be anything from a bug in Apache itself, PHP, WordPress or an installed WP theme or plugin. I'm not a gambling man, but if I were my bet would be on a WP plugin. Have you installed additional plugins and/or themes since initial install? In my experience, poorly coded plugins are the main cause of WordPress problems...

There is also a chance that something else is going on. Perhaps your server is getting hacked? Are you using keys to log in to your server? Or password? If password is it a really good one?

Bottom line is, to work out what your issue is, you'll actually need to do some in depth testing and/or monitoring. Until then, both of us are purely guessing...

Here are a few suggestions: Start monitoring CPU usage and check things out as soon as there appears to be troubles. Once the issues occur, check what is running and the resources that it's using so you can pin down what is actually happening). Try doing a backup when it's broken (if you can) and restore that to a new server and see what happens. Another idea could be to launch a local VM and restore a backup to that and just leave it running and see what happens there (if you can reproduce the issue you can debug it locally; if you can't then it's obviously something to do with AWS and/or having direct internet access.

Add new comment