pjdurham's picture

I stood up a TKL LAMP stack box (Dell Precision 670) last fall and have had no significant problems.

During this week the UPS this box was connected to began intermittent whining. I attributed it to recent power outage and UPS switch seemingly getting stuck. But now I know the battery is dead.

Today I did a system shutdown of all 3 boxes that were connected to that UPS.

I reconfigured my power. Then I rebooted all 3 boxes (the other 2 are Dell Precision 650 running TKL & ancient Gateway 500 running pfSense).

The GW/pfSense box came up fine. The DP650/TKL came up fine. The DP670/TKL did not come up fine. On reboot, it began the "First Boot Configuration" sequence.

So I thought maybe I had some media somewhere it wasn't supposed to be - nope. 3 iterations of reboot have yielded the same results.

In the past I've rebooted this machine on occasion. And as I stated at the top, the issue instigating this reboot was a UPS. (The box had gone off line once or twice in the last week because of that. But had always reboot smoothly.)

I've spent ~30min searching for similar problems but haven't found any.

Can someone point my in the direction of what might be going on?

Thanks

/perry

Forum: 
Jeremy Davis's picture

Power outages (ie unclean shutdown) can cause disk corruption (especially if the system was writing to HDD when the poser was lost) so first thing I would do, would be to run an fsck on your filesystem. Probably best way to do that would be with a Linux LiveCD.

After you've run the fsck (still running from the LiveCD) mount your filesystem and check the firstboot config flag. If you check /etc/default/inithooks it should include "RUN_FIRSTBOOT=false" if it is 'true' then change it to 'false' and reboot with your fingers crossed.

TBH I have no idea why your server should have reenabled firstboot scripts, but hopefully this will fix it and you haven't lost any data.

Jeremy Davis's picture

But I guess better late than never huh!?!

FWIW it sounds like either a corrupted ISO or some other strange thing going on... The other option in Proxmox is to use the OpenVZ templates that are available to download from within the Proxmox WebUI.

Jeremy Davis's picture

It's weird that they want to keep rerunning. If you complete them then it shouldn't want/need to rerun them. So it sounds like for some reason they weren't totally finalising. However you can manually disable them from re-running. The file to edit is /etc/default/inithooks. Change the
RUN_FIRSTBOOT=true
to
RUN_FIRSTBOOT=false
Jeremy Davis's picture

Sorry to hear that you had a similar painful experience of this issue. But thanks for dropping in and sharing your thoughts.

I agree that when it's easy enough; starting again can often be the best option. IMO the main rationale for that is that if something is not working as it should, then perhaps there are other non-obvious issues under the surface?

OTOH, we're human too, so sometime bugs do creep in. So it's fantastic to share any info, issues or feedback with us. Even if it's not actually a bug on our end, it can give us ideas of how we might do things better; perhaps even just document stuff better.

Actually, new info has come to light since most of the posts in this thread. I must have been in a rush when I posted back in 2016 as I'm pretty sure that was after we'd discovered this "issue". I think I know what may have caused this. On previous container builds (e.g. LXC containers on Proxmox), you actually had to complete the firstboot scripts AND quit out of the inithooks to finalise the initialisation. Otherwise it considered the initialisation to be incomplete (so would rerun next time).

I only accidentally noticed that non-intuitive behaviour within the last year or 2. In a relatively recent update to inithooks (v14.1 OOTOMH) that problematic behaviour was improved. Now you just need to complete all the inithooks for it to be considered "completed".

Having said all that. Great to hear that just starting again resolved it for you. Great work! :)

Add new comment