ZZRabbit's picture

Hi

Fresh install of Lamp 17.0, Install security updates, reboot and kernel panic. boots old kernel but not the update.

Forum: 
ZZRabbit's picture

BUMP

I can't be the only one that has had this problem. I have tried on 3 machines Proxmox VE, Virtualbox and bare metal. Every time the same Kernel Panic on fresh install with security updates.

Jeremy Davis's picture

What platform(s) are you guys running on? I see you note Proxmox Ian, I assume v7? [update: re-reading I see these questions are already answered]. I've tested on Proxmox v6 (pve-qemu-kvm=5.2.0-6) and can't reproduce?

FWIW the recent kernel update was a Debian security update, so not something specific to TurnKey. It was released nearly a week ago and covered a number of CVEs.

I guess that there is a chance that some default TurnKey config (that isn't default Debian config) is a compounding factor. But I can't think of anything off the top of my head, and we're certainly not doing anything radical.

I just did a clean install of a v17.0 TKLDev from ISO to KVM (Proxmox v6.x). I installed all security updates on firstboot and rebooted. Everything went as expected and I definately have the new kernel installed and loaded:

root@tkldev ~# apt policy linux-image-amd64
linux-image-amd64:
  Installed: 5.10.113-1
  Candidate: 5.10.113-1
  Version table:
 *** 5.10.113-1 500
        500 http://security.debian.org bullseye-security/main amd64 Packages
        100 /var/lib/dpkg/status
     5.10.106-1 500
        500 http://deb.debian.org/debian bullseye/main amd64 Packages
root@tkldev ~# uname -a
Linux tkldev 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux

I have googled around and can't find any other specific reports related to this kernel. My suspicion is that it may be a regression in a specific kernel driver that you guys are using but I'm not. It still doesn't explain why there aren't more reports of this though.

Jeremy Davis's picture

I have done a heap more digging and have so far been unable to find any other reports of kernel panics related to this kernel update?! There are no bugs on the Debian bugtracker specifically related to this kernel. I've also checked the bugs assigned to the kernel meta-package and trolled through all the bugs assigned to the kernel source package and can't find any others specifically related to this kernel update.

Other than official Debian sources, the only thing I could find online specific to this kernel update is news about it e.g. here.

Jeremy Davis's picture

Apologies on posting stream of consciousness...

Ok, so I've re-read through the posts here and realised there were some details I have overlooked. So to recap:

  • ZZRabbit reproduced a kernel panic with the latest Debian kernel on VirtualBox, Proxmox and bare metal!
  • Ian reproduced a kernel panic with the latest Debian kernel on 2 different versions of Proxmox v7.x.
  • I can't produce a kernel panic on Proxmox v6.x.

I assume from your messages Ian, that you experienced the exact same message on both PVE platforms?

ZZRabbit, I have a few questions for you:

  • Are you getting the same message as Ian?
  • Kernel panic : not syncing: VFS: Unable to mount root fs on unknown block(0,)
  • Were all your tests (PVE, VBox & bare metal) all on the same hardware?
  • So ZZRabbit, was this all the same underlaying hardware? I wonder if it's something specific to the actual hardware that you're running on (that is being passed through to VIrtualBox and Proxmox)? If they are all different hardware, then I'm somewhat stumped.

    Although on face value it sounds like you may have hit the same issue, we should keep in mind that it is possible that they're different issues (that both stem from the updated kernel).

    After more searching online, common causes of the particular kernel panic noted by Ian are:

    • lack of memory - seems unlikely as I double checked by installing a new v17.0 VM with only 256MB RAM. That updated and rebooted fine on the new kernel.
    • lack of free disk space - seems highly unlikely for a new install?!
    • unable to load the required storage driver - seems the most likely of these 3,. But if you are both experiencing the same issue, it seems unlikely that you would both experience it across different platforms - with no other broader reports from other Debian users?!

    There may be other causes of that kernel panic message, because I'm certainly no kernel expert. But this certainly seems like a strange one...

Jeremy Davis's picture

Ok I can reproduce the issue on VirtualBox.

I'll do a bug report to Debian. For now I can only suggest that you keep running on the old kernel.

[update] I have done a bug report upstream to Debian. Hopefully someone might come through with some ideas?!

TBH, as I've already noted, I'm not sure what more we can do for now other than keep running the old kernel.

FWIW I also have Debian running on my laptop (where VirtualBox is installed; used to recreate the issue) and that is running this latest kernel too. So it must be some side case issue.

deutrino's picture

Just chiming in to say I experienced this with the Syncthing 17.0 image a couple days ago, tried 3x to install a VM on Proxmox 7.x, tried with both i440fx and q35 "chipset" but not with EFI yet.

The panic also happens if you install security updates manually with e.g. 'apt dist-upgrade' or your favorite command. Which would make sense if it's being caused by the update, it wouldn't matter whether you installed the updates as part of the TKL initial setup scripts, or deferred it til later like I experimented with doing.

Jeremy Davis's picture

After digging a bit deeper it appears that the issue occurs when /boot is inside the LVM. I don't know, but I wonder if this is a kernel regression, in that it can no longer find LVM volumes in early boot?


UPDATE: Scratch all that. I re-ran through these steps again and still get a kernel panic! :( I don't understand why it worked the first time, but not the second. There's obviously something I don't understand going on, perhaps something additional I did that I forgot to note?


I have tested moving /boot into a separate ext4 partition and it seems to boot ok now with the new kernel.

This is a non-trivial operation so I don't recommend trying this on a production server. I have got it to work, but it's not for the faint-hearted. It's also clearly not a great fix so we'll keep looking for something better. In the meantime, follow along if you want to try this yourself:

First add a new drive to your VM. It doesn't have to be huge, but unless you clean up old kernels it will fill up over time. So I suggest 200MB minimum, but 500MB or larger is probably better if free space isn't an object. You may need to stop or reboot your VM to see (or perhaps even add) your new disk (reboot with the old kernel still).

I'll assume that the new disk is /dev/sdb but it may be something different for you.

fdisk /dev/sdb

Press 'n' to make a new partition and accept all the defaults (it will make single partition that spans all the free space). Then press 'a' to make partition '1' bootable. Finally 'w' to write and exit.

Now create the filesystem:

mkfs.ext4 /dev/sdb1

Make a temporary mount point and mount it. Then copy across the contents of /boot:

mkdir /tmp/mnt
mount /dev/sdb1 /tmp/mnt
cp -R /boot/* /tmp/mnt/
umount /tmp/mnt

Now get the UUID of /boot and write it to the fstab:

SD=/dev/sdb1
UUID=$(blkid | sed -En "\|^$SD| s|.* (UUID=\"[0-9a-f-]*\").*|\1|p")
echo "$UUID  /boot ext4 errors=remount-ro 0 1" >> /etc/fstab

Now test your fstab:

mount -a

Check to see /boot is mounted:

mount | grep boot

You should get something like:

/dev/sdb1 on /boot type ext4 (rw,relatime,errors=remount-ro)

Now I reconfigure the new kernel (just to be on the safe side), install grub to the new disk and also run 'update-grub' for good measure (update-grub probably isn't required).

dpkg-reconfigure linux-image-5.10.0-14-amd64
grub-install /dev/sdb # note the disk - not the partition
update-grub
Now poweroff your VM and ensure that it boots from the new drive (in VirtualBox I had to delete both then re-add them, the new little disk first, then the original disk - on Proxmox you can likely just rearrange the boot order). Reboot and fingers crossed, it will now boot from the new disk with the new kernel.
ZZRabbit's picture

The update is not creating the Initramfs. Even if you try and manually it still won't do it.

Keeps deferring update.

Jeremy Davis's picture

Wow I hadn't noticed that, but you are right! All the servers that work have '/boot/initrd.img-5.10.0-14-amd64'. The ones that don't, only have '/boot/initrd.img-5.10.0-13-amd64', with '/boot/initrd.img-5.10.0-14-amd64' missing.

I'm not sure what is wrong with update-initramfs but I reinstalled it and then it would create the initrd:

apt install --reinstall initramfs-tools

Then I manually generated the initrd:

root@lamp ~# update-initramfs -c -k 5.10.0-14-amd64 -b /boot
update-initramfs: Generating /boot/initrd.img-5.10.0-14-amd64
I: The initramfs will attempt to resume from /dev/dm-1
I: (/dev/mapper/turnkey-swap_1)
I: Set the RESUME variable to override this.

So now I do have the initrd, but still - the same kernel panic persists... :(

ZZRabbit's picture

I noticed that after reinstall initramfs-tools and manually initrd worked, but I checked the grub.cfg and noticed:

 

linux    /boot/vmlinuz-5.10.0-14-amd64 root=/dev/sda1 ro net.ifnames=0  consoleblank=0

linux    /boot/vmlinuz-5.10.0-13-amd64 root=UUID=aa513ef5-9aa8-4e8c-ab69-c8ff5e6a802e ro net.ifnames=0  consoleblank=0

 

ran update-grub and it worked

linux    /boot/vmlinuz-5.10.0-14-amd64 root=UUID=aa513ef5-9aa8-4e8c-ab69-c8ff5e6a802e ro net.ifnames=0  consoleblank=0

linux    /boot/vmlinuz-5.10.0-13-amd64 root=UUID=aa513ef5-9aa8-4e8c-ab69-c8ff5e6a802e ro net.ifnames=0  consoleblank=0

 

 

deutrino's picture

Hey there, just FYI, the kernel panic was happening on my VMs without LVM set up.

In my case I was creating two drives in Proxmox, both with 'manual' partitioning in the installer, no LVM.

The first drive had a swap partition and /, both were 'primary' partitions.

The second drive had /home as a 'primary' partition.

 

Jeremy Davis's picture

Thanks everyone for all the info that has been shared. The bug reports and feedback have been invaluable. I am now confident that I have a handle on what the issue is and a vague idea where to start looking to fix the bug (FYI it's in our build process).

Existing v17.0 servers that are affected can work around the issue like this:

apt install --reinstall -y initramfs-tools
dpkg-reconfigure linux-image-5.10.0-14-amd64

As noted by others, there may be other issues, but in all the tests I did (with a clean install of LAMP) the above was all that was required. So that is the minimum required fix.

FYI the dpkg-reconfigure line will both create the initrd and update grub.

I'll share more when I get a chance. Until then, if you haven't already, please share exactly which appliance(s) you have experienced this bug with and/or what other specific issues you had with which appliance(s). Also post back if you continue to have issues.

Thanks again to all of you!

ZZRabbit's picture

Hi All

Thank You

I have been using Turnkey Lamp for about 10 years & this is the first problem that I have ever had.

Great team work really shows in the Turnkey Community.

 

Thank You All

Add new comment