Chad Huntley's picture

I have a working LAMP virtual machine. I can use it for as long as I want with no problems. However as soon as I reboot, the entire file system goes into read-only and I am unable to do anything.

I have a snapshot I keep reverting back to so I can continue using the machine, but no matter what, the next time I shutdown or reboot everything goes to read-only.

The only changes worth noting since the last snapshot (from a long time ago):

-www-data was added to the dialout group

-I configured a serial port with COM4

Any ideas?

Forum: 
Jeremy Davis's picture

Assuming that you are using ext4 FS and you are shutting the machine down cleanly (ie using the 'halt' or 'reboot' commands) I have no idea why it would be doing that. It may be worth checking the logs to see if you can see any clue as to what the problem is. The logs are found in /var/log and the kernel log is possibly the first one to look at. You can read the last bit of it using the tail command: tail /var/log/dmesg or simply dmesg to read the whole thing. You could also double check whether it's the FS itself that is being mounted read only using the mount command.

Chad Huntley's picture

It was just working for a couple of reboots, then it happened again. It is very random, but once it happens I have to go back to the snapshot. So it is nothing to do with moving that user to the group of the serial port.

Here is a paste of a part of the kern.log file that mentions read-only: http://pastebin.com/kymgNn6U

It seems to happen with the most frequency when I reboot the computer. I have this virtual machine setup to run at boot with a .bat file with the following contents:

 

@echo off
cd "C:\Program Files\Oracle\VirtualBox"
VBoxManage startvm "Sellsius"
Jeremy Davis's picture

Lines 1 & 2 are normal and don't indicate any issues. Lines 35 & 37 suggest that the filesystem is being mounted read/write. I'd double check by running

dmesg|grep read-only

If you only get line 2 showing up then the filesystem is being mounted read/write and your problem is somewhere else.

To double check that you can have a look at mount and you should have something like this. ('rw' = read/write, 'ro' = read only)

# mount
/dev/mapper/turnkey-root on / type ext4 (rw,errors=remount-ro)
...
/dev/sda1 on /boot type ext2 (rw)

OTOH if you get a line in dmesg that reads "Remounting filesystem read-only" (and /dev/mapper/turnkey-root as 'ro') then obviously it is mounting read only and I would suggest you force your VM to run a fsck. You can do that like this:

touch /forcefsck
reboot

TBH if it is mounting read only and seeing as it seems to occur randomly, my first guess would be something wrong with the underlaying filesystem (ie your physical harddrive - it could be on the way out??). I'd run a full chkdsk (assuming it's a Windows host) on the drive where the VM HDD images reside. You may need to also run a fsck on the guest FS too (after running chkdsk on the host).

Another possibility could be that your com driver is flakey which is upsetting the kernel. I know nothing about that sort of thing so you would need to do further research. Probably the Ubuntu forums (search first, post if you find nothing relevant) would be your best bet IMO. FYI TKL v11.x is based on Ubuntu 10.04/Lucid.

Chad Huntley's picture

After running dmesg|grep read-only I get:

[    1.982920] Write protecting the kernel read-only data: 1884k
[  147.819185] EXT4-fs (dm-0): Remounting filesystem read-only

Typing in mount shows this:

/dev/mapper/turnkey-root on / type ext4 (rw,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
none on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /dev type devtmpfs (rw,mode=0755)
none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
none on /dev/shm type tmpfs (rw,nosuid,nodev)
none on /var/run type tmpfs (rw,nosuid,mode=0755)
none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)
none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
/dev/sda1 on /boot type ext2 (rw)

mount: warning: /etc/mtab is not writable (e.g. read-only filesystem).
       It's possible that information reported by mount(8) is not
       up to date. For actual information about system mount points
       check the /proc/mounts file.

So I went ahead and forced a fsck, but I got this:

touch: cannot touch `/forcefsck': Read-only file system

I then went back a snapshot, turned off the COM port, and turned on the VM. Same thing happened again. There is the possibility that my hard drive is going out on the host machine, however this scares the hell out of me knowing this could happen to the client's machine I'm about to deploy this on.

Jeremy Davis's picture

Doh! Obviously it won't allow you to 'touch' a file on a read-only FS (touch creates an empty file). Sorry about that useless sidetrack...

Anyway we can safely assume that there are errors in your virtual HDD! (hence why it is mount read-only). Unless the VM has been just powered off numerous times (ie not shut down cleanly) in my experience the most likely cause is corruption of the underlaying FS (ie the physical HDD). Did you run a chkdsk on the host to confirm my suspicions? If not I suggest you do so before you do anything else. Assuming you are running a Win OS you may need to consult the eventviewer to confirm that there were issues.

Once you've done that, try booting your broken VM with a Linux liveCD ISO (pretty much any Linux CD that can run live - TKL is as good as any, although GParted is a pretty good one too IMO, I use it a bit to repair computers). Double check that the FS isn't mounted (most live CDs shouldn't automount, although some might) and run

fsck -y /dev/mapper/turnkey-root

That should check the main filesystem (the '-y' switch answers yes to repairing anything that it finds broken). Although YMMV and you may have to fsck the underlaying LVM first eg

fsck -y /dev/turnkey/root

Having said all that though... Assuming you have a backup (that you have tested - TKLBAM is a winner IMO) then it may be just as easy to create a new VM and restore your data there (after you have fixed the errors on the physical HDD - assuming there were some).

As for your concerns, I can assure you that this is not a normal situation... I have personally not had this same issue (although others have but I have some TKL VMs that have been running for months between reboots and don't miss a beat). Although I have had different issues (with TKL and other OSs, including Win, both as VMs and on bare metal) when running on a failing/faulty physical hard drive. Any OS will fail in some way, shape or form if the FS is corrupted and/or failing. Having said that obviously it doesn't look good and I can't guaratee that you won't encounter the same issues when your client is running it (although I'd put it down to coincidence).

Chad Huntley's picture

I believe you are on the right track:

I'm on Windows 7, and ran chkdsk. I looked at my event viewer and found:

Event ID 11
The driver detected a controller error on \Device\Harddisk1\DR4.

There were also many other instances of that error from the past year, they seem to happen several times a month. I've noticed for quite awhile that my hard drive's performance has been slow. The error can mean a lot of things, but it is quite the coincidence that it happened right along with chkdsk.

I've also been turning on and off the VM and have yet to hit a problem with it.

If you agree it is my host hard drive, I believe right now is a great time to throw away this 4 year old hard drive and get a SSD!

Jeremy Davis's picture

But do you have only one HDD? What about a USB stick plugged in? AFAIK Win identifies the first HDD as Harddisk0!? Did the chkdsk find any errors?

TBH that may still be it, especially if you have noticed slow performance but I wouldn't be too hasty. Like you say, that error can be caused by lots of things: failing drive, failing or loose cable, even driver conflicts. I'd try a little more diagnosis first. Probably worth checking your cables and the drives SMART info, as well as doing a disk scan with the manufacturer's tools (Seagate & WD both have an ISO you can burn to CD and boot from - possibly they have a Win tool too?)

If that all looks clean then I'd be digging a little deeper in Win and looking at drivers, perhaps update the relevant motherboard drivers (if they're not already up to date - if they are try rolling them back to previous versions). Initially I'd try the latest drivers from the chip manuafcturer themselves, although perhaps the (usually older) drivers available from the motherboard manufacturer may also be worth a try.

Out of interest are you using VMware? I did a quick bit of googling and came across one person who claimed that they resolved their "Event 11" issue by uninstalling VMware! TBH I doubt it was actually VMware in and of itself, but perhaps a conflict between VMware and some other driver?

Ahhh the joy of computers...! :)

Chad Huntley's picture

Found the chkdsk log:

Checking file system on C:
The type of the file system is NTFS.


A disk check has been scheduled.
Windows will now check the disk.                         

CHKDSK is verifying files (stage 1 of 3)...
  1216000 file records processed.                                          File verification completed.
  1320 large file records processed.                                      0 bad file records processed.                                        2 EA records processed.                                              51 reparse records processed.                                       CHKDSK is verifying indexes (stage 2 of 3)...
  1511680 index entries processed.                                         Index verification completed.
  0 unindexed files scanned.                                           0 unindexed files recovered.                                       CHKDSK is verifying security descriptors (stage 3 of 3)...
  1216000 file SDs/SIDs processed.                                         Cleaning up 64 unused index entries from index $SII of file 0x9.
Cleaning up 64 unused index entries from index $SDH of file 0x9.
Cleaning up 64 unused security descriptors.
Security descriptor verification completed.
  147841 data files processed.                                            CHKDSK is verifying Usn Journal...
  35937592 USN bytes processed.                                             Usn Journal verification completed.
Windows has checked the file system and found no problems.

 312472575 KB total disk space.
 203859536 KB in 834822 files.
    383384 KB in 147842 indexes.
         0 KB in bad sectors.
   1345459 KB in use by the system.
     65536 KB occupied by the log file.
 106884196 KB available on disk.

      4096 bytes in each allocation unit.
  78118143 total allocation units on disk.
  26721049 allocation units available on disk.

Internal Info:
00 8e 12 00 90 fe 0e 00 f1 f2 18 00 00 00 00 00  ................
22 13 00 00 33 00 00 00 00 00 00 00 00 00 00 00  "...3...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Windows has finished checking your disk.
Please wait while your computer restarts.

So while it does seem like there were no bad sectors or errors found, running this did seem to fix my problems with the virtual machine.

I'm running a Sony AR-770 laptop, and there are actually 2 drives connected to a RAID. It doesn't seem like the RAID can easily pop out, and while I'm comfortable taking apart a desktop, I am not comfortable doing that with a laptop.

The drives are Seagate, and I'm going to try to run their utility. Unfortunately my CD/DVD drive bit the dust over a year ago, so I will have to do it from a USB drive.

I'm using VirtualBox

Jeremy Davis's picture

It does seems strange that it now works, but that it didn't find any errors. I guess just keep an eye out for more of those errors. I'd also perhaps check for an updated RAID driver.

Other than that I'm out of ideas.

Add new comment