TurnKey Linux Virtual Appliance Library

Turnkey Bugzilla 11.1 x86 segfault when run on 64-bit ESX host

Hi,

We've been using Turnkey Linux Bugzilla 3.0.4.1-2+lenny2 for a while now and so far are very happy with it. Due to a feature that exists in Bugzilla 3.2 and up, I deployed the latest Bugzilla 11.1 x86 on a separate VM. I noticed that whenever powered on under a 64-bit ESX host, the TKL Bugzilla OS segfaults and the boot sequence drops to an initramfs shell. If I migrate it to a 32-bit host, it runs just fine.

Next week we are planning to perform the tranistion to the new Bugzilla, however our production ESX is a 64-bit host. I guess it worth to menion that our 64-bit hosts are running other 32-bit OSes just fine. It seems that it only TKL 11.1 x86 is bothered by it.

Any ideas?

Thanks in advance

Jeremy's picture

Known issue possibly?

Have a look at these threads here and here and see if that's the problem. If so, then AFAIK there still hasn't been confirmation of exactly what the problem is or a reliable workaround (beyond disabling aceleration during boot) so perhaps you could help there with some troubleshooting?

Some troubleshooting...

Thank you Jeremy, it was good reading. I definitely believe I have the same problem.

To troubleshoot the issue further, I tried to deploy TKL Bugzilla 11.1 x86 OVF on several hosts I have in the lab. The following table summarizes the results:

 

Vendor \ Model

CPUs \ Memory \ Architecture Host Version Disable Accel. TKL Bugzilla x86 OVF boot
IBM x336 2 CPUs * 3.2GHz \ 8GB \ 32-bit ESX 4.1 260247 no Passed
IBM x336 2 CPUs * 3.2GHz \ 8GB \ 32-bit ESXi 4.1 260247 no Passed
IBM x3550 8 CPUs * 1.861GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 no

Failed: "initramfs error"

IBM x3550 8 CPUs * 1.861GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 yes Passed
Dell OptiPlex 780 2 CPUs * 2.99GHz \ 16GB \ 64-bit ESXi 4.0 171294 no Passed
Dell OptiPlex 780 2 CPUs * 2.99GHz \ 16GB \ 64-bit ESX 4.1 260247 no Passed
Dell OptiPlex 780 2 CPUs * 2.99GHz \ 16GB \ 64-bit ESXi 4.1 260247 no Passed
Dell OptiPlex 780 2 CPUs * 2.99GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 no Passed
HP ProLiant DL360 G5 8 CPUs * 1.866GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 no Failed: "Kernel panic - not syncing"
HP ProLiant DL360 G5 8 CPUs * 1.866GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 yes Passed

 

My conclusion is that the root cause may be lying either in v4.1u1 or in multiple CPUs hosts. I might try installing one of the HP or IBM hosts with ESXi 4.1 260247 (to isoloate Update1 issues) or ESXi 4.1u1 348481 (to isolate ESXi issues) and see if the are different results then.

 

Any thoughts?


Jeremy's picture

Excellent work!

This info may well help narrow down the cause and hopefully provide us with some kind of more reasonable workaround? From a brief browse over your results it seems that it may be related to a combo of the updated 4.1 and some specific hardware?

If you get a chance it may be useful to test TKL alongside a Ubuntu 10.04.1 server install and see how that compares. If that runs ok on the server that TKL crashes on then there may be something that TKL can do to assist resolve the issue.

So thanks again for being such a team player :)

Not a TKL issue, apparently

Okay,

I downloaded Ubuntu Server 10.04.2 and deployed a pretty basic OS, without any additional packages. The results are interesting:

Vendor \ Model CPUs \ Memory \ Architecture Host Version Disable Accel. Ubuntu 10.04.2 i386 boot
IBM x336 2 CPUs * 3.2GHz \ 8GB \ 32-bit ESXi 4.1 260247 no Passed
IBM x3550 8 CPUs * 1.861GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 no Failed: Kernel Panic \ Seg fault
IBM x3550 8 CPUs * 1.861GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 yes Passed
Dell OptiPlex 780 2 CPUs * 2.99GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 no Passed
HP ProLiant DL360 G5 8 CPUs * 1.866GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 no Failed: Segfault
HP ProLiant DL360 G5 8 CPUs * 1.866GHz \ 16GB \ 64-bit ESXi 4.1u1 348481 yes Passed

 

It seems that the TKL boot problems are actually related to the OS, rather than to TKL Bugzilla itself. Also, it seems that the type of error is inconsistent with reboots \ HW. I got both types of errors in the x3550 machine...

Any ideas on how do we proceed from here? Is Ubuntu aware of the problem? Does TKL plan to ship a 10.10-based Bugzilla in the near future?

Thanks.


A possible workaround.

Hi,

I did some digging around, and it appears to be a problem with ubuntu's generic kernel. The problem was fixed for me by installing the pae kernel.

This is what I did:

* Enabled VMI paravirtualisation (disabling acceleration was too slow).

* apt-get install linux-image-generic-pae

* Disabled VMI paravirtualisation 

All good :)

...deon

THX for Workround.

very nice.

helped me today.

but it is a problem with that ubuntu "virtual edition" kernel. not that generic one

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)