Matt Trentini's picture

Hello folks,

I really like the features offered by the Turnkey solutions!  Thanks folks.  However I've run into a problem.

I installed the Revision Control Appliance 11.0RC to a VM using VMWare Player (running on Windows 7) and made a few tweaks [1] and tested.  Works great!  Then I used VMWare Converter to convert and copy the VM to our VMWare ESXi 4.1 [2] server.  Alas, the VM doesn't boot under ESXi - it either kernel panics or drops me to a busy box terminal [3].

I also tried converting to VMWare Server 1.0.9.  This also failed, though it always failed straight after grub finished its countdown [4]

Finally, the Turnkey install iso also fails to boot under ESXi - it gets to the install menu but shortly after I select "install to hard disk" it fails with a "General error mounting filesystem" [5].  Has anyone successfully installed this ISO to a VM hosted in ESXi?

It's worth noting that the regular Ubuntu 10.04 install iso installs fine under ESXi.

There seems to be something unusual that Turnkey is doing that causes grief with ESXi.  But I've been investigating the problem for a long time now - is anyone else having these sorts of problems?

BTW, the 2009.10-2 VM runs fine but I'm interested in using Mercurial and the version there is ancient.

Any help would be really appreciated!

 

Thanks,

Matt

 

[1] Specifically, I added another virtual hard drive and mounted it at /srv/repos to store the repositories in a separate vmdk.

[2] Technically, "VMware vSphere Hypervisor"

[3] Grub does the usual countdown, the kernel is unpacked and begins booting but it looks like some issue with the hard drive causes problems.  Can't give text output from ESXi's console but see attached images: turnkey_vm_busy_box.png and turnkey_vm_kernel_panic.png

[4] I'm not particularly surprised at this as the last Ubuntu version supported under Server 1.x was the 6.x release.  It's likely a grub2 issue(?).

[5] See turnkey_iso_boot_fail.png.

Forum: 
Jeremy Davis's picture

But I've just seen in another thread than Alon (one of the core devs) has stated that Part1 of the 11.0 release (not rc) will be happening very soon. My understanding is that the VM (and other) images will go up at the same time or soon after.

That may resolve your issue hopefully, although it'd still be good to know why it's not working as is (because ideally you should be able to install from ISO or VM image).

If you think it may be something to do with grub2, try removing it and installing grub legacy instead in your VMware Player machine and try reimporting. At least that may help work out what the issue is.

I won't have access to an ESXi box and vSphere until after the first, but I can report back whether I can accomplish an install or conversion or both sometime soon after.

Matt Trentini's picture

@Jed Thanks, I couldn't find the thread you mentioned, do you have a link?  Hopefully the latest release will fix the issue. 

I'll look in to replacing grub2 with grub but I'm more suspicious of that being an issue under VMWare Server which is less important for me (I'd really rather prefer having this on ESXi).  Grub2 appears to be starting fine under ESXi, the fault manifests itself later in the boot process.

@Rik Thanks, I'd really appreciate it!

Jeremy Davis's picture

Sorry mate I can't find it again either :(. From memory it wasn't very specific, just saying that it'd be out soon. And TBH I don't know how many major changes will occur between rc and final release.

Dan Robertson's picture

I just downloaded the iso and was able to successfully install it on ESXi 4.1 as well as ESX 4.0.

Matt Trentini's picture

Heya Dan, thanks for trying it out.

I just re-downloaded the RC11 iso, validated the sig and tried installing again on my ESXi 4.1 install (build 260247) using a custom VM (everything default except Ubuntu/32 bit). Same problem - stops with a kernel panic shortly after selecting "install to hard disk".

Frustrating.  I wonder what's different about our systems?

Thanks again guys but I'm starting to run out of options...

Dan Robertson's picture

Which SCSI controller type are you using for the hard disk?  Perhaps you could post a screenshot of the settings you are using.  You could also try downloading an original Ubuntu Server iso and see if you have the same issues with that.

Matt Trentini's picture

I'm using the default, LSI Logic Parallel.  I'll try others.

I have tried the standard Ubuntu 10.04 installer; no problems at all.  Though that was the Desktop install, I'll also try the server version and see how it goes.

Below are listed some of the notable settings (didn't bother w the screenshot as they're spread out over many forms - can post the vmx if it will help):

  • Mem: 512MB
  • CPUs:1
  • Video card
    • 4MB RAM (Tried making it bigger; no help)
  • VMCI: Restricted
  • SCSI controller 0: LSI Logic Parallel
  • Hard disk 1: Virtual Disk
    • 8GB
  • CD/DVD Drive 1: [250GB_Datastore]iso/turnkey-revision-control-11.0-lucid-x86.iso
    • Connect at power on
  • Network adapter 1: VM Network

Seems very vanilla...anything jumping out?

Matt Trentini's picture

OK folks, I've stumbled on something interesting.  If I boot the system with "Disable acceleration" set (Edit Settings->Options->Advanced->General), the system starts fine, albeit slowly.

After the boot period, I can turn acceleration on again and everything is stable.  System runs great.

But whenever I try to boot the VM with acceleration on it fails with a kernel panic.  There seems to be some issue with acceleration and some part of the boot process.

I'm still investigating but has anyone seen anything similar?  I am suspicious of the HW/BIOS/ESXi install but it's weird that it only affects this Turnkey install...

Dan Robertson's picture

What type of datastore are you using?  iSCSI/FC/NFS/Local ?  If you are using a NAS or SAN, try using the local datastore.  I have not seen this problem before, but most of my work is with Windows VM's.

Have you verified your setup is on VMware's hardware compatibility list?

Matt Trentini's picture

I'm using a local datastore. 

I'm running ESXi on commodity hardware (it's our testbed before rolling it out for more permanent use).  It's a Dell Optiplex 745 (Core 2 Duo, 8GB RAM, a couple of SATA HD's not RAID'd) and no, it's not on the compatibility list.  However, ESXi 4.0 is on the community list for that HW.

So while it's definitely a possibility that there's an issue with ESXi and the HW it'd be surprising since it's such common HW...but who knows?  Maybe it is simply a problem with the HW & ESXi?

Steve's picture

Not sure this is HW as our server is HP DL380 and plays nice with ESXi - on all the compat lists and very high spec

But we are having the same kernel panics and most of the above problems too, local datastore, vanilla setups plus some custom tries also

tried all scsi drv versions as well with no difference

Interestingly v11 did install nicely on our dev server HPG3 running ESXi 3.5 (multiple test installs all good)

Does this point anyone in the right direction

I'll clone the successful 3.5 VM, upgrade the config and move it to ESXi 4 and post whether any success

Yves's picture

My config is ESX 4.1 on Dell 2850, which is a dual xeon 32 bit.  I tried updating the 2009.10 to get into the same problem with Grub.

During the install of the ISO, things go fine.  After booting, no way to get around the kernel panic.  I tried all kind of configuration settings, booting from iSCSI target and local RAID.  It fails in every case.    The machine remains stuck at the console during the boot process.  I went into looking in the different mount points but could not detect any error.  (But I do not know a lot about the linux boot process.)

Tried launching the VM image on another machine with VMWare Fusion, and it runs fine.  Some of the VM machines were running fine on ESXi 4.0 on the same hardware.

As far as I am concerned, the grub2 is the offending component.

Steve's picture

My 3.5 clone failed so loaded up 11.1 to ESXi 4 on HP DL380's

Seems to run OK without acceleration, but doesn't boot with :(

Has anyone had success with this?

 

Jeremy Davis's picture

Not sure whether it would make a difference or not but is probably worth a shot. The VM builds use a difference kernel so perhaps that may resolve it?

Steve's picture

Hi Guys

Good thinking, thanks

Will try both kernel replace and new install and post results

Steve's picture

VM builds also seem to replicate the same errors

turned off acceleration and works fine, turn back on and stalls at console

It must be me but I can't seem to nail it!

I can't really disable accel on boot as clients reboot their own tkl's at any time

Jeremy Davis's picture

I know the OP said that Ubuntu 10.04 was working ok but for the sake of completeness it would be useful for you to also try that out (if you haven't already).

In search of a workaround I'd also try a test instal of TKL without LVM. I would imagine that if Ubuntu 10.04 is fine then you can rule out FS (ext4) and GRUB2 as the potential problem.

Beyond that I'm not sure where to go next?

Steve's picture

Already running Ubuntu 10.0.4 as compare test and yes, it plays nice....

(but I now can't live without tklbam - so this is not an option for us!)

 

Good thought on LVM - Thanks - I will test and post..........

Steve's picture

And after update and upgrade even without accel I am back to K Panic

Will try alternate LVM option later this week - I wonder though whether this will compromise the tklbam a bit?

Back to the drawing board  :)
If I revert back to TKL9 as standard platform am I missing much between the core versions in terms of functionality and perfomance do you think?

Liraz Siri's picture

Even if you temporarily revert to using the previous version, you can always use TKLBAM to migrate between them fairly easily once we figure out what the problem is.

If you say stock Ubuntu 10.04 works for you then we can't be too far off from isolating what it is about TKL 11 (also based on Ubuntu 10.04) that is giving you problems.

Liraz Siri's picture

I'm currently looking into this issue. Some users are reporting a smooth experience with ESXi. This leads me to suspect the host's hardware configuration or some subtle aspect of ESX's configuration may be playing a part here. It would be helpful to have as much information as possible from anyone encountering issues, including host hardware configuration (e.g, CPU type) and screenshots. You can't attach screenshots to comments so if you have this problem, and you see something different from the screenshots already attached above, please start a new topic on the forums.
L. Arnold's picture

I have been exploring Subversion and was about to install anyway.  I will try to log the install and see where it goes.

L. Arnold's picture

Well,  I have not yet "done anything" other then get to the Web Root.

Installed Revision Control ISO 11.1 to VMWare ESXI 4.0 with my normal protocol:

(After setting up the VM and pointing to the ISO for CD/DVD), set up to a Thin - 16gb Drive.

Install to HD

Yes (what is the first question?)

Yes (what is the second question?)

default 90% (then Tab, OK)

Write Changes >  Yes

Grub > Yes

Restart

Assign Root Passoword (only password asked)

I did not install TKLBAM or Auto Updates just now.

I could do this again if we need more detail - as I was instinctively installing at the beginning.

Anything else that should be tested?

L. Arnold's picture

I was trying to copy a Magento TKL VM yesterday, but when I went to boot the new VM in ESXI I was getting an ETH2 Error.  I coiuld not work this problem out, though I have done this process many times before with Magento TKL machines even.

At that point I went to give the TKLBAM restore a try to a New TKL Magento build (11.1 instead of 11.0) and this did take.  I needed to copy all my passwords and ip's over, then go into www/magento/var and eliminate everything in /sessions and /cache  (don't delete the Cache folder itself as otherwise you have to reset ownership and settings - which I had do do)...  Should be a script for that.

Anyway, VMWare coinverter did not properly move the ESXI Magento Image, and I could not find the problem before I threw the build away.

L. Arnold's picture

hmm.  I just opened another VM that had been sitting (OpenErp)  Not sure if this is a fresh copy or had been working.  ..  Guessing that I need to "restart" or go to the ESXI console and see whats up with ETH2..  I have only been using ETH1 in my recollection..  anyway, irritations do happen.

L. Arnold's picture

First 2 settings in creation were

Guided/LVM

then, Write Changes.  (can't edit my post so adding as a reply)

Jeremy Davis's picture

So if you re-enable acceleration (after boot) and update to the newer (backported) kernel thats available in the repos then you can leave acceleration on and it should boot fine (others have confirmed this). I can't recall the exact command OTTOMH but if you have a search here in the forums you should find it.

Jeremy Davis's picture

The fact that 'fsck /boot terminated' suggests to me that there may be issues with with the filesystem on /boot (which I'm guessing is /dev/sda1). It looks like it wasn't shutdown cleanly. So I'd run a manual fsck from a LiveCD/ISO. So boot the VM from a Linux LiveISO and run 'fsck -f -y /dev/sda1' and for good measure 'fsck -f -y /dev/mapper/turnkey-root' the f switch should force it to run and the y switch means it will run without interaction (ie won't wait for you to type yes to repair any faults it finds). 

Jeremy Davis's picture

After a fairly extensive google, I unfortunately have nothing further to add. Probably the quickest and easiest fix would be to create a new VM and restore from a backup (or copy the data across from the old VM).

If you want to persevere, then I'd look at mounting your FS as a chroot (using your LiveISO) and run 'dpkg --reconfigure -a' (to make sure there are no broken packages) and then I'd probably consider uninstalling grub (using the purge switch) and reinstalling it. This is all just stabbing in the dark though really...

As a longer term (permanent solution), personally what I'd be looking at is ditching ESXi (if possible) because I hear so many people having issues with it with Linux VMs. My personal preference is ProxmoxVE (free open source Enterprise level Debian based headless hypervisor). It has a WebUI so a little learning curve but there is very active community and quite good documentation. All the TKL appliances are available for download from the WebUI (as OVZ templates, which means they run much sweeter and use much less resources) but PVE also contains KVM so you can still have Windows guests (or Linux if you wish to install from ISO). For the last few years I have a couple of PVE installations running Linux and Windows and have had a great experience with it. I have read a few emails on the mailing list where people have transitioned from ESXi to PVE and bought all their existing VMs with them (apparently KVM can handle vmhd images - although I can't speak from experience). Just a thought...

Jeremy Davis's picture

Is there any correlation with issues and updates from VMware?

If you suspect a TKL/Ubuntu update, first thing I'd try is a clean install of the TKL appliance and run the security updates on first boot. Then reboot and see if the error occurs in a vanilla TKL appliance with all security updates applied. If it is a security update it will cause the problem straight away (and we can try to hunt it down from there).

Otherwise perhaps try selectively applying your backup. If it's not a security update (as tested above) and you haven't specifically installed some extra packages (or run apt-get upgrade), then I can't see how it could be caused by restoring your backup. But for completeness of testing, you could try excluding packages from your restore. Use the '--skip-packages' switch. See TKLBAM documentation for further info (specifically the tklbam-restore man section).

Steve's picture

Probably/possibly - Just through trying stuff I found that I can defeat the kernal panic chnaging the defaultVM settings from Automatic to Use Intel VT for instruction set and software for MMU

I have no idea that this actually means, how it differs from automatic or what implications it has - other thn it works for me

I have only done this on one host (I have 2 that demostrate and replicate exactly the fault) but am hopeful it's a good enough workaround until I replace hardware

Add new comment