Eduard Grebe's picture

 I deployed a Turnkey LAMP appliance to VPS.net's cloud (which uses Xen virtualisation). From time to time the server (which is fully up to date) experiences what is reported on the console as "soft lockups", like below. I find this baffling. VPS.net support say this is a kernel bug with Ubuntu 32bit. Is this correct? Is there anything I can do to solve the problem?

The machine becomes unresponsive and an endless stream of these messages are spewed on the console:

[52846.563059] BUG: soft lockup - CPU#0 stuck for 11s! [apache2:4744]
[52873.811124] BUG: soft lockup - CPU#1 stuck for 11s! [kswapd0:106]
[52858.304554] BUG: soft lockup - CPU#0 stuck for 11s! [apache2:4744]
[52885.550344] BUG: soft lockup - CPU#1 stuck for 11s! [kswapd0:106]

Here is what VPS.net support said to me:

"This error seems to be from a buggy kernel version that you have. This is a 32 bit Ubuntu - Xen issue rather than a problem with vps.net itself, so you may try updating your kernel and check if it helps. Please let us know if we could be of any assistance."

Forum: 
Liraz Siri's picture

We're aware of the problem and we're very frustrated about it. It's been reported before. Unfortunately, we haven't really managed to make progress with VPS.NET on this. It seems to be a bug that's triggered by an interaction between that version of the Ubuntu kernel and VPS.NET's Xen implementation. When Ubuntu 10.04 LTS comes out in a few months we'll release new versions for all our appliances which will include a new kernel and will hopefully solve this issue.

In the meantime other VPS provides such as GigaTux and GPLHost also support TurnKey, and none of their users have reported this issue so you may want to check them out...

Eduard Grebe's picture

Dear Liraz. Thanks for the feedback, it is good to know it was not a configuration problem.

I actually find the state of affairs a bit shocking. VPS.net is listed as a partner on your page. And they have turnkey appliance "templates" on their system. If there is an incompatibility they should not support Turnkey -- or fix the issue. Anyway, I realise I am not a paid subscriber to Turnkey, but at least from VPS.net I would expect better as a paying customer.

In the end I went with the bog-standard Ubuntu 8.04 x64 installation, but I had to do a lot of work to get the configuration right -- stuff that Turnkey supplies out of the box. So this has been an unpleasant experience for me, but I realise you are probably doing your best to get it sorted.

I hope to try Turnkey again after the Lucid release. Good luck with the work.

Liraz Siri's picture

We weren't aware of the issue when we partnered up with VPS.NET and ever since it was discovered we were hoping the issue would be identified and resolved quickly but that didn't happen.

Unfortunately, we're out of the loop here. We don't have access to VPS.NET's infrastructure so we can't work on this directly. Since other Xen based VPS providers have not reported this issue perhaps more could have been done to at least find a workaround but that's just speculation on our part. Their engineering team doesn't report to us so we don't know what they've tried. Heck, we don't even know if the technical suggestions we've made have been attempted, such as forcing a different kernel to be used (e.g., Redhat kernels or Linode kernel) or using a 64-bit kernel like GigaTux does.

FWIW, if it was our decision we would remove the images entirely until a fix or workaround could be found. VPS.NET aren't to blame for a bad interaction between an Ubuntu kernel and Xen, but leaving a pitfall like that around is not (in my opinion) the right thing to do. That's up to them though.

For now, the best we can do is add a warning about this issue to the original partnership announcement.

I'm very sorry for the inconvenience. Do you think there is anything more we can do?

Eduard Grebe's picture

 A warning would be good. I do hope VPS.net addresses this issue, since your appliances are a really compelling tool. Good luck!

Marc Warne's picture

Just so you know as well, we have recently done some testing using a CentOS 5 32-bit kernel along with Turnkey Linux and this works without these major issues. There are some minor ones (such as warnings about missing features in the old CentOS kernel, 2.6.18), but we have not seen anything remotely along the lines of soft lockups.

These are 32-bit kernel and userspace domUs running under a 64-bit dom0. I very much doubt there will be any issue with other 32-bit quality kernels here either.

Neil Aggarwal's picture

We use the CentOS 5.4 64 bit to host TKL and it works as well.

Neil Aggarwal's picture

We are also supporting TKL appliances now.  We am offering a free week for people to try it out and see how well it works.  We are still setting up our web site, but you can contact me if you would like to try our service.

Neil Aggarwal's picture

The VPS.NET problem still exists.

If you would like to try an appliance on my company's servers for free, please check out http://UnmeteredVPS.net/tkl
 

Liraz Siri's picture

I just exchanged emails with Carlos Rego from VPS.NET who says:

"It also only happens when the user's ram usage is near fully used, on many tests we ourselves did, we where unable to reproduce the error on most instances."

To solve this I urged VPS.NET to try and use a CentOS kernel as GigaTux have reported success with that.

Marc Warne's picture

By default, Linux caches as much as it possibly can, so surely RAM usage is usually full? Maybe it's more to do with some bug in the swapping subsystem than directly RAM usage?

We've tested this with CentOS (32-bit and 64-bit), Debian Lenny and Debian Etch kernels (64-bit only).

I'd be interested in seeing whether this works though.

Add new comment