You are here
Dan Dennedy - Sat, 2011/11/26 - 03:53
I am having a new problem that seems to have appeared sometime in the last few weeks as I do not recall running into this the last time I was working on my project in late October. When restoring from tklbam backup symbolic links are being converted into unlinked files. That is not the worst problem. If the target of the link does not yet exist, then the link/file is not restored! This appears in /var/log/tklbam-restore as something like:
OVERLAY ERROR @ /usr/local/lib/libmlt.so: [Errno 2] No such file or directory: '/tmp/tklbam-wqBv6Z/usr/local/lib/libmlt.so'
and breaks library dependencies of my custom software making it unable to execute! Anybody else seen this? It is not limited to my files but also things like:
OVERLAY ERROR @ /etc/initramfs-tools/modules: [Errno 2] No such file or directory: '/tmp/tklbam-wqBv6Z/etc/initramfs-tools/modules'
I looked for problem reports and the cmd line options for duplicity but did not come up with anything. Moreover, duplicity is supposed to restore symbolic links as symbolic links yet within tklbam they do not!
Forum:
I'm not sure what is going on
I'm not sure what is going on there but I don't believe your hypothesis regarding symbolic links is it.
Yes, of course TKLBAM restores symbolic links as symbolic links. Anything else would be ridiculous on a Linux system. If you doubt that I propose you launch a Core instance, create a symbolic link in your home directory back it up and see if its restored on another test instance.
Actually that's a good approach to problem solving in general. Try to reproduce the problem in its simplest, most isolated occurrence.
Restore to Symbolic link
Sorry Liraz missed the later post where there is an admission of a problem with TKLBAM symbolic links. Please see second half of my post though regarding the further problem with ownership and permissions, this problem happens even when the correct volume is mounted.
Can you help isolate the problem with ownership/permissions?
Tim, if you could help isolate the problem you're experiencing with ownership and permissions so I can reproduce it! independently from Core that would be very helpful! I'm just a few days from getting back to TKLBAM (currently pushing out the 11.3 maintenance release for tomorrow).
Steps to
Steps to reproduce:
notice "bar" does not exist
Facepalm
Sigh. Thanks for taking the time to isolate this. I just confirmed that you are absolutely right and we do in fact have a problem. Mea culpa.
TKLBAM is supposed to backup symbolic links as symbolic links. This is a regression. I'm not currently sure when it was introduced and what is causing it, but I'll get to the bottom of it as soon as I get back to TKLBAM development. That was supposed to happen a couple of weeks ago, but various distractions have interluded. Real soon now.
Wrong permissions after restore
I'm afraid I can't shed any light on why this is happening, there are no errors in the backup of restore logs. I've just completed a test exercise and screen dumps below show before and after. Used TKLBAM via webmin to do this. Sequence on events... Did FULL backup then restored it, permissions were wrong as below. Did 'chown -r www-data:www-data' to correct and ran backup again (incremental) then restored again. Files again had their ownership changed to root. This is happening only in this directory tree which has several thousand files in
The behaviour I've seen is not consistent either as I've also experienced it changing the directory permissions and the read/write/x attributes. Am I going mad or is something very weird going on?
Before
After
Ok well it's now actually showing my screen dumps so I've pasted the text too.
I fixed the ownership issue as well
http://bugs.python.org/issue9993
http://bugs.python.org/issue1355826
Once I figured out what was going on the fix was straightforward - a wrapper around shutil.move that fixed symlinks and ownership (there don't seem to be any other issues):
One strange thing I noticed
One strange thing I noticed is that in my master system /var/lib/tomcat6/conf is also a sym link (to /etc/tomcat6). But when I start up a new system from a back up, the work and logs folders are missing (as I said above), but there seem to be a real /var/lib/tomcat6/conf folder with a copy of the contents form /etc/tomcat6.
Any progress?
Any progress on this (https://bugs.launchpad.net/turnkeylinux/+bug/910515)?
The priroty of the bug is "uncertain". It should be a blocker.
Don't worry about the bug
Don't worry about the bug tracker status. It's definitely a blocker and will be fixed in the upcoming next version of TKLBAM.
Revision to my post
The WordPress sites no longer work after a reboot - same goes for the Vanilla Forms LAMP Appliance. So its across the board.
Hope this gets fixed soon - can't wait to use the hub with my current VM's
Nailed it! An automatic update will be issued tomorrow...
Anyhow, I traced back this issue to an obscure Python bug that effects how shutil.move works when moving symlinks between filesystems:
http://bugs.python.org/issue9993
This only effected tklbam-restore's behavior on small and medium sized EC2 instances since 27/10/2011 when we started mount --bind /tmp to /mnt. Restores to non-EC2 TurnKey installations worked fine. Also, there was never a problem with the backups themselves, just the restore process.
Bottom line: I committed a fix, and we'll push out an automatic update tomorrow so that everyone gets a new version of tklbam automatically. After that restores should work just fine.
Hooray and well done!
I shall be over the moon if you've cracked this one! As you know the original /tmp problem and the restore symlinks prob have both caused me a lot of headaches. Since you say it only affects the restore would I now be able to fix all my links by going back to an original full backup and restoring all the incrementals? Because I still have many issues with broken links.
Were you able to shed and light on my posts showing problems with permissions being incorrectly restored, this appeared to happen when what should have been a symlink was replaced with the file structure it pointed too during restore.
Occurred on a micro instance
Great to see this is being worked on ... however, the problem I reported here occurred on a micro instance, not a small or medium.
Bug on Micro? Maybe. Ownership/Permissions should be solved too
Hmmm.... I guess Micro instances might have been affected too. I was mainly inferring from the fact that it doesn't have temporary storage on /tmp. But I think we may still be mount --bind'ing /tmp on Micro instances to /mnt/tmp in which case it may still have triggered the Python shutil.move moving-a-symlink-across-filesystems bug.
Anyhow, I'm pretty confident I cracked the symlink problem for good and that any related permission/ownership issues will go away as well. As soon as the archive is updated Alon will probably post a comment on this forum post.
Package archive updated
The package archive has been updated with the latest version of TKLBAM which fixes the symlink bug, as well as some other issues found during testing. You can install the latest version manually as follows (alternatively it will also be installed automatically with the daily security updates):
...but how do I get my original symlinks back?
Now you've fixed the bug I still have to solve the problems caused by all the missing symlinks and all the errors I have left right and center when I install stuff. I know you said the backups are actually ok and it affected the restore. But I have this historic scenario now in event order...
Create and install appliance
Did original backup + some incrementals
Did a Restore, lost the symlinks - they were replaced with files of the same name instead of link
Have done many more full and incrementals since resulting in the backups containing files instead of symlinks.
With me so far? If I now restore the original backup I guess the symlinks will return because bug is fixed but surely when I then apply later backups they will overwrite the symlinks with the files caused by the earlier restore problem.
SO is there a way to tell the restore not to overwrite a symlink with a file of the same name????
I.E. I want to find a way to fully restore my appliance up-to-date with the original symlinks in place.
Yes good point.
And unfortunately I don't have a quick fix, but I'm sure can be done. A couple of ideas that spring to mind are:
Or
find / -type l -exec ls -l {} \; > /root/symlinks
(don't run 'find /' on a system that has mounted network filesystems eg NFS - otherwise it may cause issues and/or take forever...) You could then compare that against the profile and just tar/rsync the ones that are in backup locations
I know neither of those solutions are pretty and I'm sorry I can't provide you with a nice little script that can do what you want but it should be possible.
Good idea
Thanks Jeremy good idea, I agree pretty it's not but definitely helpful. I'm still a little nervous about the integrity of my appliance though doing a manual fix like this. Think I'll restore original and incremetals prior to the original restore that caused the symlink problems to a new appliance, from there I'll use the find command you've created then see if I can use that info to fix them by hand on the live appliance. Hope there's not loads of them!
Thanks.
Same issue exists
Has this issue been solved? I just updated everything on my VM's and did a fresh back up and tried to restore to a new Amazon instance and get the same thing.
Webmin still has no permissions and the restores don't work 100%. My word press appliances work - but the webmin doesnt. Project Pier restores also do not work - makes it seem ilikes its a fresh project pier install. Webmin doesn't work on that one either.
Also have a Lamp appliance running open cart - which just gives internal server errors - while the VM I have in my data center - works flawlessly.
Ideas? Is there something I can run to get this going? I was totally excited to run on Amazon but this just isn't working.
Should be resolved!?!
So the machine you backed up was all working correctly? In other words, you're sure that it wasn't a backup that already had the symlink/permission problems?
You could also ensure that the new version of TKLBAM is installed on both origin and target (it should be installed via security updates but just to be sure you could 'apt-get update && apt-get install tklbam')
I'll try to test myself sometime soon to see if I can reproduce the issue(s).
Nope!
Yeah I double checked it all. I updated all VM's in my environment with the latest updates for EVERYTHING including TKLBAM. Then once the updates were done - I rebooted each VM. Once rebooted and everything was in working order I started a fresh set of back ups.
Once the back ups were done - I did a restore from that back up to a new VM and then wamo. Same thing. Some of the VM's in amazon work - like wordpress and lamp running vanilla - however - i still get the access isssue when getting into Webmin. This is consistent across the board for all of them.
The other thing - is that the lamp appliances running Project Pier, OpenCart and Atmail - don't work at all after I restore. Project Pier thinks its a fresh install. OpenCart just gives internal server errors - can't even get to webmin. And Atmail - well - Atmail we can disregard. That doesn't work period on Amazon no matter what I do. Fresh install or not.
Hope this helps.
works for me
As the original poster, I just want to report this fix is working fine for my use case. I am just using core and running custom, unpackaged software with uninstalled libs. Now the links in libs/ are fine. Also, while I do not use webmin much, it seems to be working, and I no longer see the errors in /var/log/tklbam-restore.
Thank you
Add new comment