TurnKey Linux Virtual Appliance Library

Problem booting Revision Control Appliance 11.0RC when hosted in VMWare ESXi and Server

Hello folks,

I really like the features offered by the Turnkey solutions!  Thanks folks.  However I've run into a problem.

I installed the Revision Control Appliance 11.0RC to a VM using VMWare Player (running on Windows 7) and made a few tweaks [1] and tested.  Works great!  Then I used VMWare Converter to convert and copy the VM to our VMWare ESXi 4.1 [2] server.  Alas, the VM doesn't boot under ESXi - it either kernel panics or drops me to a busy box terminal [3].

I also tried converting to VMWare Server 1.0.9.  This also failed, though it always failed straight after grub finished its countdown [4]

Finally, the Turnkey install iso also fails to boot under ESXi - it gets to the install menu but shortly after I select "install to hard disk" it fails with a "General error mounting filesystem" [5].  Has anyone successfully installed this ISO to a VM hosted in ESXi?

It's worth noting that the regular Ubuntu 10.04 install iso installs fine under ESXi.

There seems to be something unusual that Turnkey is doing that causes grief with ESXi.  But I've been investigating the problem for a long time now - is anyone else having these sorts of problems?

BTW, the 2009.10-2 VM runs fine but I'm interested in using Mercurial and the version there is ancient.

Any help would be really appreciated!

 

Thanks,

Matt

 

[1] Specifically, I added another virtual hard drive and mounted it at /srv/repos to store the repositories in a separate vmdk.

[2] Technically, "VMware vSphere Hypervisor"

[3] Grub does the usual countdown, the kernel is unpacked and begins booting but it looks like some issue with the hard drive causes problems.  Can't give text output from ESXi's console but see attached images: turnkey_vm_busy_box.png and turnkey_vm_kernel_panic.png

[4] I'm not particularly surprised at this as the last Ubuntu version supported under Server 1.x was the 6.x release.  It's likely a grub2 issue(?).

[5] See turnkey_iso_boot_fail.png.

Jeremy's picture

I can't help you with your issue

But I've just seen in another thread than Alon (one of the core devs) has stated that Part1 of the 11.0 release (not rc) will be happening very soon. My understanding is that the VM (and other) images will go up at the same time or soon after.

That may resolve your issue hopefully, although it'd still be good to know why it's not working as is (because ideally you should be able to install from ISO or VM image).

If you think it may be something to do with grub2, try removing it and installing grub legacy instead in your VMware Player machine and try reimporting. At least that may help work out what the issue is.

I can promise to try it

I won't have access to an ESXi box and vSphere until after the first, but I can report back whether I can accomplish an install or conversion or both sometime soon after.


Thanks folks

@Jed Thanks, I couldn't find the thread you mentioned, do you have a link?  Hopefully the latest release will fix the issue. 

I'll look in to replacing grub2 with grub but I'm more suspicious of that being an issue under VMWare Server which is less important for me (I'd really rather prefer having this on ESXi).  Grub2 appears to be starting fine under ESXi, the fault manifests itself later in the boot process.

@Rik Thanks, I'd really appreciate it!


Jeremy's picture

Hey Matt

Sorry mate I can't find it again either :(. From memory it wasn't very specific, just saying that it'd be out soon. And TBH I don't know how many major changes will occur between rc and final release.

works ok for me

I just downloaded the iso and was able to successfully install it on ESXi 4.1 as well as ESX 4.0.

Still doesn't work for me...

Heya Dan, thanks for trying it out.

I just re-downloaded the RC11 iso, validated the sig and tried installing again on my ESXi 4.1 install (build 260247) using a custom VM (everything default except Ubuntu/32 bit). Same problem - stops with a kernel panic shortly after selecting "install to hard disk".

Frustrating.  I wonder what's different about our systems?

Thanks again guys but I'm starting to run out of options...


Which SCSI controller type

Which SCSI controller type are you using for the hard disk?  Perhaps you could post a screenshot of the settings you are using.  You could also try downloading an original Ubuntu Server iso and see if you have the same issues with that.

ESXi VM settings

I'm using the default, LSI Logic Parallel.  I'll try others.

I have tried the standard Ubuntu 10.04 installer; no problems at all.  Though that was the Desktop install, I'll also try the server version and see how it goes.

Below are listed some of the notable settings (didn't bother w the screenshot as they're spread out over many forms - can post the vmx if it will help):

  • Mem: 512MB
  • CPUs:1
  • Video card
    • 4MB RAM (Tried making it bigger; no help)
  • VMCI: Restricted
  • SCSI controller 0: LSI Logic Parallel
  • Hard disk 1: Virtual Disk
    • 8GB
  • CD/DVD Drive 1: [250GB_Datastore]iso/turnkey-revision-control-11.0-lucid-x86.iso
    • Connect at power on
  • Network adapter 1: VM Network

Seems very vanilla...anything jumping out?


Disable acceleration

OK folks, I've stumbled on something interesting.  If I boot the system with "Disable acceleration" set (Edit Settings->Options->Advanced->General), the system starts fine, albeit slowly.

After the boot period, I can turn acceleration on again and everything is stable.  System runs great.

But whenever I try to boot the VM with acceleration on it fails with a kernel panic.  There seems to be some issue with acceleration and some part of the boot process.

I'm still investigating but has anyone seen anything similar?  I am suspicious of the HW/BIOS/ESXi install but it's weird that it only affects this Turnkey install...


Datastore?

What type of datastore are you using?  iSCSI/FC/NFS/Local ?  If you are using a NAS or SAN, try using the local datastore.  I have not seen this problem before, but most of my work is with Windows VM's.

Have you verified your setup is on VMware's hardware compatibility list?

Datastore - local, may not be compatible.

I'm using a local datastore. 

I'm running ESXi on commodity hardware (it's our testbed before rolling it out for more permanent use).  It's a Dell Optiplex 745 (Core 2 Duo, 8GB RAM, a couple of SATA HD's not RAID'd) and no, it's not on the compatibility list.  However, ESXi 4.0 is on the community list for that HW.

So while it's definitely a possibility that there's an issue with ESXi and the HW it'd be surprising since it's such common HW...but who knows?  Maybe it is simply a problem with the HW & ESXi?


Same situation...

I am running into the same problem.  I ran 11.0RC on an Intel-based ESXi 4.0 host without problems, and now I'm getting back to this project and tried running 11.0 release on an AMD ESXi 4.1 host.  Everything works fine if I disable acceleration, as Matt suggested.  I tried the local datastore in addition to the iSCSI storage, no difference there.

I am running plenty of other VMs on the ESXi 4.1 environment without problems, including other Debian and Ubuntu VMs.  No clue why this is such a pain.

Sorry to hear that...

Sorry to hear that you've got the same problems...though I'm pretty happy to hear I'm not the only one!  :)

Incidentally, I've been running the VM for the better part of a week now and it has been rock-solid.  Probably been running it enough to confirm that it does seem to only be a problem during the boot process.


Same here - additional info

Not sure this is HW as our server is HP DL380 and plays nice with ESXi - on all the compat lists and very high spec

But we are having the same kernel panics and most of the above problems too, local datastore, vanilla setups plus some custom tries also

tried all scsi drv versions as well with no difference

Interestingly v11 did install nicely on our dev server HPG3 running ESXi 3.5 (multiple test installs all good)

Does this point anyone in the right direction

I'll clone the successful 3.5 VM, upgrade the config and move it to ESXi 4 and post whether any success

Same here - 2009.10 fine, 11.0RC dumps

My config is ESX 4.1 on Dell 2850, which is a dual xeon 32 bit.  I tried updating the 2009.10 to get into the same problem with Grub.

During the install of the ISO, things go fine.  After booting, no way to get around the kernel panic.  I tried all kind of configuration settings, booting from iSCSI target and local RAID.  It fails in every case.    The machine remains stuck at the console during the boot process.  I went into looking in the different mount points but could not detect any error.  (But I do not know a lot about the linux boot process.)

Tried launching the VM image on another machine with VMWare Fusion, and it runs fine.  Some of the VM machines were running fine on ESXi 4.0 on the same hardware.

As far as I am concerned, the grub2 is the offending component.

Fails to install on my Dell

Fails to install on my Dell 2900 gen III with dual quad core xeons x64 running ESXi 4.1. If I disable the the accelleration as stated in the above posts everything works fine. I just re-enable after boot, so far so good, just annoying to remember to disable again if I restart the vm.

Any Solution Yet?

My 3.5 clone failed so loaded up 11.1 to ESXi 4 on HP DL380's

Seems to run OK without acceleration, but doesn't boot with :(

Has anyone had success with this?

 

Jeremy's picture

Have you guys tested with the VM builds at all?

Not sure whether it would make a difference or not but is probably worth a shot. The VM builds use a difference kernel so perhaps that may resolve it?

linux-virtual kernel in the VM builds should work well with ESX

The 11 RC versions were in ISO form. TurnKey only made the VM optimized builds available after the official 11 release. So these guys were all running the ISOs and those have the generic kernel while the VM builds use the linux-virtual kernel which was optimized by the Ubuntu team for virtualization.

I haven't tested it with ESX myself (just KVM), but it would be mighty strange if the virtualization-optimized kernel on Ubuntu 10.04 would not work well with VMWare, one of the leading virtualization platforms. Ubuntu 10.04 is a long term release so I imagine a great deal of testing was done before the release. Moreover, TurnKey isn't even using 10.04, it's using 10.04.1 which includes all the updates and bugfixes for issues discovered after that version of Ubuntu was released and got widely tested in production.

Jeremy's picture

Have a read about the differences between ISO & VM image

Here in the TKL docs.

If you wish to test you could just try installing the kernel into your current setup (sorry I'm lazy so can't give you any links or clear instructions but you should find something online easily).

Good thinking

Hi Guys

Good thinking, thanks

Will try both kernel replace and new install and post results

Must be me

VM builds also seem to replicate the same errors

turned off acceleration and works fine, turn back on and stalls at console

It must be me but I can't seem to nail it!

I can't really disable accel on boot as clients reboot their own tkl's at any time

Jeremy's picture

Have you tried Ubuntu Server 10.04?

I know the OP said that Ubuntu 10.04 was working ok but for the sake of completeness it would be useful for you to also try that out (if you haven't already).

In search of a workaround I'd also try a test instal of TKL without LVM. I would imagine that if Ubuntu 10.04 is fine then you can rule out FS (ext4) and GRUB2 as the potential problem.

Beyond that I'm not sure where to go next?

Thanks JM - will post results...

Already running Ubuntu 10.0.4 as compare test and yes, it plays nice....

(but I now can't live without tklbam - so this is not an option for us!)

 

Good thought on LVM - Thanks - I will test and post..........

*sigh*

And after update and upgrade even without accel I am back to K Panic

Will try alternate LVM option later this week - I wonder though whether this will compromise the tklbam a bit?

Back to the drawing board  :)
If I revert back to TKL9 as standard platform am I missing much between the core versions in terms of functionality and perfomance do you think?

Liraz Siri's picture

Help us figure it out!

Even if you temporarily revert to using the previous version, you can always use TKLBAM to migrate between them fairly easily once we figure out what the problem is.

If you say stock Ubuntu 10.04 works for you then we can't be too far off from isolating what it is about TKL 11 (also based on Ubuntu 10.04) that is giving you problems.

Liraz Siri's picture

Investigating, need as much information as possible

I'm currently looking into this issue. Some users are reporting a smooth experience with ESXi. This leads me to suspect the host's hardware configuration or some subtle aspect of ESX's configuration may be playing a part here. It would be helpful to have as much information as possible from anyone encountering issues, including host hardware configuration (e.g, CPU type) and screenshots. You can't attach screenshots to comments so if you have this problem, and you see something different from the screenshots already attached above, please start a new topic on the forums.

Same issue.

Using ESXi 4.1 same issue.. Kernel Panic at install. Using turnkey-core-11.1-lucid-x86.iso  Did anybody ever figure this out?

L. Arnold's picture

will try a test on ESXI today

I have been exploring Subversion and was about to install anyway.  I will try to log the install and see where it goes.

L. Arnold's picture

Installed Rev Ctrl 11.1 ISO on ESXI 4.0 w/ DHCP without incident

Well,  I have not yet "done anything" other then get to the Web Root.

Installed Revision Control ISO 11.1 to VMWare ESXI 4.0 with my normal protocol:

(After setting up the VM and pointing to the ISO for CD/DVD), set up to a Thin - 16gb Drive.

Install to HD

Yes (what is the first question?)

Yes (what is the second question?)

default 90% (then Tab, OK)

Write Changes >  Yes

Grub > Yes

Restart

Assign Root Passoword (only password asked)

I did not install TKLBAM or Auto Updates just now.

I could do this again if we need more detail - as I was instinctively installing at the beginning.

Anything else that should be tested?

L. Arnold's picture

yesterday, using VMWare converter did have some trouble on copy

I was trying to copy a Magento TKL VM yesterday, but when I went to boot the new VM in ESXI I was getting an ETH2 Error.  I coiuld not work this problem out, though I have done this process many times before with Magento TKL machines even.

At that point I went to give the TKLBAM restore a try to a New TKL Magento build (11.1 instead of 11.0) and this did take.  I needed to copy all my passwords and ip's over, then go into www/magento/var and eliminate everything in /sessions and /cache  (don't delete the Cache folder itself as otherwise you have to reset ownership and settings - which I had do do)...  Should be a script for that.

Anyway, VMWare coinverter did not properly move the ESXI Magento Image, and I could not find the problem before I threw the build away.

L. Arnold's picture

ETH2 Error on anohter machine as well

hmm.  I just opened another VM that had been sitting (OpenErp)  Not sure if this is a fresh copy or had been working.  ..  Guessing that I need to "restart" or go to the ESXI console and see whats up with ETH2..  I have only been using ETH1 in my recollection..  anyway, irritations do happen.

L. Arnold's picture

settings: Guided LVM, Write Changes Yes

First 2 settings in creation were

Guided/LVM

then, Write Changes.  (can't edit my post so adding as a reply)

Disable Acceleration

I encounter the same problem, and I am able to solve this after I

Turn off the Intel Virtualization in BIOS

and

Disable Acceleration

(Edit Settings->Options->Advanced->General),

Disable Acceleration

I can also confirm that disabling acceleration solves the issue and lets me install the LAMP stack.

Jeremy's picture

Many have complained that this slows the VM too much

So if you re-enable acceleration (after boot) and update to the newer (backported) kernel thats available in the repos then you can leave acceleration on and it should boot fine (others have confirmed this). I can't recall the exact command OTTOMH but if you have a search here in the forums you should find it.

Quick solution kernel panic ubuntu 10.04 LTS under esxi

As previously suggested, disable the acceleration; then, enable the last kernel ppa repository, install the 2.6.38.xx kernel (I've successfully tested it); reboot with the new kernel and the acceleration enabled, the VM works like a charm!

For example: (you need build-essentials and linux-headers if you want to install vmware tools)

apt-get install python-software-properties; apt-add-repository ppa:kernel-ppa/ppa; apt-get install linux-image-2.6.38-12-generic linux-headers-2.6.38-12-generic; reboot

Regards

Michele Masè

Similar Issue?

My TKL revision control appliance recently started hanging during boot. Attempted disabling acceleration and virtualization extensions in VMWare ESXi with no luck. The image was working flawlessly until a recent reboot.

This is where it is hanging: 

 

Any ideas?

 

Thanks!

Jeremy's picture

My guess is data corruption

The fact that 'fsck /boot terminated' suggests to me that there may be issues with with the filesystem on /boot (which I'm guessing is /dev/sda1). It looks like it wasn't shutdown cleanly. So I'd run a manual fsck from a LiveCD/ISO. So boot the VM from a Linux LiveISO and run 'fsck -f -y /dev/sda1' and for good measure 'fsck -f -y /dev/mapper/turnkey-root' the f switch should force it to run and the y switch means it will run without interaction (ie won't wait for you to type yes to repair any faults it finds). 

Tried that... no dice... same

Tried that... no dice... same error. Argh.

I was able to mount and browse around the filesystems successfully as well with SystemRescueCD.

Hrm...

Jeremy's picture

Hmmm....

After a fairly extensive google, I unfortunately have nothing further to add. Probably the quickest and easiest fix would be to create a new VM and restore from a backup (or copy the data across from the old VM).

If you want to persevere, then I'd look at mounting your FS as a chroot (using your LiveISO) and run 'dpkg --reconfigure -a' (to make sure there are no broken packages) and then I'd probably consider uninstalling grub (using the purge switch) and reinstalling it. This is all just stabbing in the dark though really...

As a longer term (permanent solution), personally what I'd be looking at is ditching ESXi (if possible) because I hear so many people having issues with it with Linux VMs. My personal preference is ProxmoxVE (free open source Enterprise level Debian based headless hypervisor). It has a WebUI so a little learning curve but there is very active community and quite good documentation. All the TKL appliances are available for download from the WebUI (as OVZ templates, which means they run much sweeter and use much less resources) but PVE also contains KVM so you can still have Windows guests (or Linux if you wish to install from ISO). For the last few years I have a couple of PVE installations running Linux and Windows and have had a great experience with it. I have read a few emails on the mailing list where people have transitioned from ESXi to PVE and bought all their existing VMs with them (apparently KVM can handle vmhd images - although I can't speak from experience). Just a thought...

Similar Issue

I had a clean running installation of prestashop on an esxi 5 server and a local VMWare fusion. After a reboot today I am running into the same issues like Steve. The thing is, if I install a clean prestashop vm and get the latests backup from the hub something is causing this issue and the reboot fails. The boot process is alway getting orphaned nodes deleted during the mount on the sda1.

During the backup restore the local vm runs most of the service successfully, however right after the reboot the machine hangs at sda assuming drive cache. apt-get upgrades even to grub... doesn't help either.

I remember having a successful restore to the local VMWare Fusion machine a month ago. I asume that some data are corrupted or a security update created this issue.

Any ideas would be great.

I am trying to go backwords week by week back to identify a successful backup.

Jeremy's picture

Or updates from VMware?

Is there any correlation with issues and updates from VMware?

If you suspect a TKL/Ubuntu update, first thing I'd try is a clean install of the TKL appliance and run the security updates on first boot. Then reboot and see if the error occurs in a vanilla TKL appliance with all security updates applied. If it is a security update it will cause the problem straight away (and we can try to hunt it down from there).

Otherwise perhaps try selectively applying your backup. If it's not a security update (as tested above) and you haven't specifically installed some extra packages (or run apt-get upgrade), then I can't see how it could be caused by restoring your backup. But for completeness of testing, you could try excluding packages from your restore. Use the '--skip-packages' switch. See TKLBAM documentation for further info (specifically the tklbam-restore man section).

My Issue resolved

Probably/possibly - Just through trying stuff I found that I can defeat the kernal panic chnaging the defaultVM settings from Automatic to Use Intel VT for instruction set and software for MMU

I have no idea that this actually means, how it differs from automatic or what implications it has - other thn it works for me

I have only done this on one host (I have 2 that demostrate and replicate exactly the fault) but am hopeful it's a good enough workaround until I replace hardware

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)