TurnKey Linux Virtual Appliance Library

CPU usage spikes to 100%; website down

Hey guys,

About an hour ago my website went down while I was working on it. It's a WordPress network enabled site. All of a sudden, it stopped loading.

I waited an hour and came back to find it still down now, and started looking around. I cannot get shell access, but I can see that the "CPU usage" chart spiked around the time the website stopped loading.

Any suggestions on what to do? Really bummed; I'm averaging like 300 visitors a day on this site. I would re-boot, but I've done a ton of work on the site today that I couldn't afford to lose. No recent backups. 

Please help! :)

Thanks,

Tom

Attached: 

Update!

OK, well the website is back up and CPU utilization is back to normal levels, I guess...8%, or so. That was weird. Any ideas what could have happened?

Thanks!

T

Where is your website hosted?

Is your website hosted with TKLHub or have you downloaded a WP appliance and uploaded it to another hoster?

If you're on AWS on a micro-instance then beware...an AWS micro-instance will give you a nice little performance boost for a short traffic spike but if the traffic spike goes on for too long AWS will shut off  access to your site until your burst quota period is recharged (euphemistically referred to by AWS as throttling). It only takes traffic above about 15% CPU utilisation for a few seconds to trigger the burst & shut-down behavior.

See https://forums.aws.amazon.com/message.jspa?messageID=274303 .

What can cause a burst of traffic that shuts down your site...possibly search engine robots / web crawlers. They scan all your pages faster than a human so it generates a spike.

Solution to your problem if on an AWS micro instance...move to a small instance or install a process rate limiter e.g. http://cpulimit.sourceforge.net/ .

Hope that helps.

Cheers,

Tim

Cheers,

Tim (Managing Director - OnePressTech)

It's happening again...sigh.

Well, it's happening again. I started working on the site again and it just starts crashing. I work on the WordPress site as it's live instead of on my local machine, I just find it to be easier this way.

The problem is that the website won't stay up. It's as if my many page refreshes are just "too much" for the server to handle, or something.

I was working on the site for the last 4 hours or so, but the last two hours, the CPU starts spiking to 100% and doesn't come down.

I am hosting on AWS EC2 to answer your question above. I DON'T want to move to a small instance for a few reasons, of them being I don't want to spend the extra money and the other being that I have another WP website installed on micro instance and it has yet to have this problem.

So now I'm beginning to think it has something specifically to do with this website/install. I'm running a multisite instance, although there's no real reason for me to do so.

Totally bummed out for now. If I stay off the site for like 20 minutes, it goes back to normal; as soon as I request the site to continue working on it, it stops dead again.

This sucks :(

You're in the Wordpress Admin death trap...

Sorry to hear you're having a bad time. If it's any consolation I just spent the last week wrestling with an apache 2.0 issue and incompatibilities between VirtualBox & VMWare regarding manifest files. A techie's life is a pain sometimes!

Regarding the problems you're having, you're in the Wordpress admin death trap. Wordpress admin takes more consistent horsepower than your ordinary site when it is operating. You can't admin a wordpress site for long under a micro instance before you get throttled.

I'm afraid you don't have many choices:

1) Move to a small instance

2) Edit your site on your local PC within a VM or WAMP and upload to your site

3) Temporarily upgrade your site while editing

It sounds like option 3 might help you out. As long as your admin work can be done at a time when your clients won't notice a momentary outage while you restart the server you can upgrade to a small / medium instance while doing your admin work then downgrade it back to a micro instance afterwards. The downside is two restarts (one to upgrade and one to downgrade).

See http://alestic.com/2011/02/ec2-change-type

4) You can also tune up your instance a bit

http://imperialwicket.com/tuning-apache-for-a-low-memory-server-like-aws...

Hope you work things out. There are other options but they are more complicated. A combination of 3 & 4 should solve your problems.

Good luck with your site.

Cheers,

Tim (Managing Director - OnePressTech)

Ugh

Ugh! Thanks for the heads up. It sounds to me like updating my WP files on my machine and then uploading later makes more sense, no?

Also, I just back after a few hours to see my site, and it's still throttled. How long does this take before it resets back to normal? I'm hoping by tomorrow morning it will reset...

You may be caught in the throttle time death spiral...

Have a look at the following two links:

1) Performance comparison between Micro and Small AWS instances

http://www.youtube.com/watch?v=EQOmqi_n_ZY

2) Performance profiling for AWS Micro

http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/concepts_micro...

As per the following quote from article #2 above, if your website user profile does not match a micro profile your throttling time will get longer and longer:

"When the Instance Uses Its Allotted Resources

We expect your application to consume only a certain amount of CPU resources in a period of time. If the application consumes more than your instance's allotted CPU resources, we temporarily limit the instance so it operates at a low CPU level. If your instance continues to use all of its allotted resources, its performance will degrade. We will increase the time we limit its CPU level, thus increasing the time before the instance is allowed to burst again."

As I said previously, there are solutions but they're more complicated as per AWS quote in article #2 above:

"If you enable Amazon CloudWatch monitoring for your micro instance, you can use the "Avg CPU Utilization" graph in the AWS Management Console to determine whether your instance is regularly using all its allotted CPU resources. We recommend that you look at the maximum value reached during each given period. If the maximum value is 100%, we recommend that you use Auto Scaling to scale out (with additional micro instances and a load balancer), or move to a larger instance type. For more information about Auto Scaling, go to the Auto Scaling Developer Guide."

Hope you work it out...you know when the price of anything is too good to be true...it is.

Cheers,

Tim

Cheers,

Tim (Managing Director - OnePressTech)

So dissapointed

OK. I think I understand. Thanks for all your time, Tim.

This isn't a huge site and it doesn't get a lot of traffic. I want to use EC2 because it's so fast and easy with TKL. But if the site locks up everytime I try to edit it, it's useless.

And I don't think it's work $60/month at this point for a small instance, either. This is incredibly dissapointing. What on earth is a micro instance good for if I can't even build a simple WP site on it?!

Thanks again for your time man.

Hey don't give up...I just gave you the groundrules :-)

Micro-instances are designed to be used for very, very low traffic websites. They are also used to compliment the other instance sizes to cost-effectively deal with spikes. If you have a small instance, for example, you can set things up so that 1+ micro-instances are automatically spawned to handle a traffic spike then turned off so they don't incur ongoing charges when not needed. Because a micro instance is designed for burst traffic it's the ideal combo.

I wasn't suggesting giving up on the Micro AWS or TKL. You just have to check your performance profile and see if it fits. If it doesn't fit then check out the AWS auto-scaling and launch a few more micro instances on a peak load or install a process rate limiter to keep from breaching the AWS throttle limit e.g. http://cpulimit.sourceforge.net/ .

Good luck. Sorry it isn't simpler.

Cheers,

Tim (Managing Director - OnePressTech)

How's this for a plan?

I think I'll launch a new micro server using a backup of the site as it is now (to get out of the admin death trap - as I write this, the site is still locked up 100% CPU!). Then I'll launch a new small instance to work on the site.

When I'm not working on the site, I can shut down the small instance. Then I'll just download/upload the micro files and DB to the micro instance - changes go live.

How's that for a plan? Anything I'm missing?

That would work. You may want to try a minor variation though...

That should work fine. Remember to swap your elastic IP to your new micro instance before shutting down your old micro instance.Alternate approach: When editing you could just shut the micro instance down then restart it as a small instance and when you're done editing shut it down and restart it as a micro instance. That would save you from having to synch your databases up if you edit a small instance copy in parallel to the running micro instance. Your clients would only see this as a minor outage since the switch only takes a minute.

Cheers,

Tim (Managing Director - OnePressTech)

Hmm..

Hey Tim, looked for a post about how to re-boot an instance as a small/micro and vice versa before reaching out to you, but couldn't find it. My TKL hub allows me to start and stop an instance, but doesn't prompt me for any options - it just starts and stops the micro. How do I do the switch on re-boot? 

You might be best to enter that as a new question

Hey Tom. Sorry I didn't realize that option may / may not be support by the TKLX Hub. I'm an appliance user, TKLBAM user and community member but I manage my servers via 3 service providers only one of which is AWS so I don't use the TKLX hub for server management.

I'm certainly not the best person to help address the question. It's worthwhile posting that as a new question so you grab the attention of the site managers or other community hub users who can better help you.

One approach that appears to me to be supported by the TKLX Hub would be to take a snapshot of the micro instance, clone it as a small instance, switch your IP address across to the small instance then shut down and delete the original micro instance. Then do the reverse to switch back from the small to the micro instance.

Sorry I couldn't be more help on the original question.

Cheers,

Tim (Managing Director - OnePressTech)

Now my small instance is spiking! What!?

I mirrored a small instance this morning and continued working on the site, and then...CRASH.

So, my micro instance and my small instance - neither work. What an Earth am I supposed to do now?!

Now that's a whole other kettle of fish...

Sorry to hear you're still having troubles Tom.

Regarding spiking it is not my understanding that AWS throttles a small AWS instance to oblivion. As per the link I provided previously a micro instance may have up to 97% of its CPU stolen whereas a small instance will only have up to 63% of its CPU stolen. It will slow down but it shouldn't crash because of cpu throttling.

Note: For the amount of time you edit your site the extra cost would be minimal to splurge on a medium instance while editing.

If you're experiencing a crash then it's likely wordpress related. You said previously that your other site is a single site wordpress instance and is working fine but this instance is a multisite wordpress instance and is giving you trouble.

Personally I would not run a multisite wordpress on a micro AWS instance. There are so many ways that can come back to haunt you on a micro instance: double site hits from address redirections from MU Mapping Plugin (if used), longer scan time by the search bots as the number of pages increases, increased probability of reinforcing spikes from the multiple sites. You would need a more complex feedback loop model using a service like scalr.net to be able to pull it off.

If you're determined to stick with the multisite and a micro instance then you'll need to roll up your sleeves and debug your wordpress instance in the traditional way...turn off all your plugins, switch to the default theme and see if it works. If it does then you know where your problem lies.

Hope it works out for you Tom.

Cheers,

Tim (Managing Director - OnePressTech)

Def. an issue with install, I'm thinking

Yup...was already ahead of you. I cloned a medium instance and sure enough the throttling continued. The site is down right now with 80% CPU usage, in fact.

So there's something going on with the multi-site. Bummer. I suppose I could dump the MU config and stick with a small instance, my issues will probably go away. I can't ever imagine going back to a traditional hosting provider again, so I suppose that's my only option!

Before you ditch the cart & the horse...

I would suggest you switch to the default theme and disable your plugins and give it a test. You may not have any problems with your plan to run a ms-wp on a micro instance. It sounds like you have a classic wp conflict. Fix that and everything else will line up as it was meant to...maybe micro, maybe small, maybe a micro with a second micro launched on an as-needed basis. Lot's of options.

If your plan is to run multiple sites for other people then you're in the right place. AWS / TKLX is one of the leading combos in cloud in my opinion. It's got some wrinkles the TKLX guys are working out but the whole cloud industry has wrinkles their working out. Cloud is no longer bleeding edge but it's barely moved out of that stage in my opionion. As they say in the movies...You're off the edge of the map, mate. Here there be monsters (just kidding :-)

It sounds to me like some methodical testing should sort you out and get you back going.

Let me know how it works out. Sounds like you're back in the saddle again.

Cheers,

Tim (Managing Director - OnePressTech)

Cool! I didn't realize you

Cool! I didn't realize you could stop a micro and start as a small, and then vice-versa. Let me take a closer look at that option. Right now I'm editing using WAMPserver and then plan to mirror the changes in the /www files and database.

Chris Musty's picture

Some Experience

I have a site that was averaging 200 visits per day and during mail outs it would spike to over 1000. I too had the 100% cpu issue but was fortunate enough to backup everything first.

My web servers backup hourly and occasionally I also get a backup from a joomla component called Akeeba backup. If I have a wayward site I can try restoring from a backup with TKLBAM but if this fails reloading from the akeeba backup works well because its really just the contents of /var/www and the DB dump.

I also moved to a small appliance after this happened several times and all I can say is night and day difference!

Chris Musty

Director

Specialised Technologies

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)