TurnKey Linux Virtual Appliance Library

tklbam not restoring symbolic links as links?

Dan Dennedy's picture

I am having a new problem that seems to have appeared sometime in the last few weeks as I do not recall running into this the last time I was working on my project in late October. When restoring from tklbam backup symbolic links are being converted into unlinked files. That is not the worst problem. If the target of the link does not yet exist, then the link/file is not restored! This appears in /var/log/tklbam-restore as something like:

OVERLAY ERROR @ /usr/local/lib/libmlt.so: [Errno 2] No such file or directory: '/tmp/tklbam-wqBv6Z/usr/local/lib/libmlt.so'
 
and breaks library dependencies of my custom software making it unable to execute! Anybody else seen this? It is not limited to my files but also things like:
OVERLAY ERROR @ /etc/initramfs-tools/modules: [Errno 2] No such file or directory: '/tmp/tklbam-wqBv6Z/etc/initramfs-tools/modules'

I looked for problem reports and the cmd line options for duplicity but did not come up with anything. Moreover, duplicity is supposed to restore symbolic links as symbolic links yet within tklbam they do not!

Liraz Siri's picture

I'm not sure what is going on

I'm not sure what is going on there but I don't believe your hypothesis regarding symbolic links is it.

Yes, of course TKLBAM restores symbolic links as symbolic links. Anything else would be ridiculous on a Linux system. If you doubt that I propose you launch a Core instance, create a symbolic link in your home directory back it up and see if its restored on another test instance.

Actually that's a good approach to problem solving in general. Try to reproduce the problem in its simplest, most isolated occurrence.

Restore to Symbolic link

I disagree with your hypothesis that TKLBAM restores symbolic links as symbolic links. I have now twice proved it doesn't when restoring my appliance in the last week.

I have a separate EBS volume mounted as /fs and a symbolic link to this directory from withing my web (/var/www/.../filestore) application but when you restore without the EBS volume mounted the restore creates the files and dirctory structure twice, once by creating the them in the web dirctory where the symbolic link should be and once again by creating an /fs directory at the root. Weird.

On top of that the permissions are going screwy.  I've included an override in TKLBAM to backup the external volume mounted as /fs (athough I'm not sure I should be as there's a sym link to it anyway). When I restore the backup though the files are restored with incorrect ownership. They were owned by www-data but are restored owned by root. Files are also set read-only when they were write enabled. What is going wrong?

Tim Collins's picture

Restore to Symbolic link

Sorry Liraz missed the later post where there is an admission of a problem with TKLBAM symbolic links. Please see second half of my post though regarding the further problem with ownership and permissions, this problem happens even when the correct volume is mounted.

Liraz Siri's picture

Can you help isolate the problem with ownership/permissions?

Tim, if you could help isolate the problem you're experiencing with ownership and permissions so I can reproduce it! independently from Core that would be very helpful! I'm just a few days from getting back to TKLBAM (currently pushing out the 11.3 maintenance release for tomorrow).

Dan Dennedy's picture

Steps to

Steps to reproduce:

  1. launch a new server on Hub with Core
  2. ssh new-server
  3. touch foo
  4. ln -s foo bar
  5.  
  6. root@core ~# ls -l
    total 0
    lrwxrwxrwx 1 root root 3 Nov 27 19:49 bar -> foo
    -rw-r--r-- 1 root root 0 Nov 27 19:49 foo
  7. tklbam-backup
  8. wait for backup to complete and note backup ID
  9. go to hub/backups
  10. locate backup by ID and choose "Restore this backup to a new cloud server"
  11. wait for new server to finish booting and restoring
  12. ssh restored server
  13.  
  14. root@core ~# ls -l
    total 0
    -rw-r--r-- 1 root root 0 Nov 27 19:49 foo

    notice "bar" does not exist

  15. less /var/log/tklbam-restore:
...
Restoring filesystem
====================

MERGING USERS AND GROUPS


OVERLAY:

/root/foo
/root/.penv
OVERLAY ERROR @ /root/bar: [Errno 2] No such file or directory: '/tmp/tklbam-b22UHX/root/bar'
...
/etc/ssl/certs/cert.pem
OVERLAY ERROR @ /etc/ssl/certs/50e684e0.0: [Errno 2] No such file or directory: '/tmp/tklbam-b22UHX/etc/ssl/certs/50e684e0.0'
/etc/ssl/certs/ca-certificates.crt
OVERLAY ERROR @ /etc/ssl/certs/6f5d9899.0: [Errno 2] No such file or directory: '/tmp/tklbam-b22UHX/etc/ssl/certs/6f5d9899.0'
...
OVERLAY ERROR @ /etc/rc2.d/K01confconsole: [Errno 2] No such file or directory: '/tmp/tklbam-b22UHX/etc/rc2.d/K01confconsole'
Liraz Siri's picture

Facepalm

Sigh. Thanks for taking the time to isolate this. I just confirmed that you are absolutely right and we do in fact have a problem. Mea culpa.

TKLBAM is supposed to backup symbolic links as symbolic links. This is a regression. I'm not currently sure when it was introduced and what is causing it, but I'll get to the bottom of it as soon as I get back to TKLBAM development. That was supposed to happen a couple of weeks ago, but various distractions have interluded. Real soon now.

Hi there! Just wanted to

Hi there!

Just wanted to check and see if any progress had been made on this issue... and if it has, how I need to upgrade to the latest version. apt-get upgrade?

 

Thanks, and keep up the great work!

 

--Branson

Tim Collins's picture

Wrong permissions after restore

I'm afraid I can't shed any light on why this is happening, there are no errors in the backup of restore logs. I've just completed a test exercise and screen dumps below show before and after. Used TKLBAM via webmin to do this. Sequence on events... Did FULL backup then restored it, permissions were wrong as below. Did 'chown -r www-data:www-data' to correct and ran backup again (incremental) then restored again. Files again had their ownership changed to root. This is happening only in this directory tree which has several thousand files in

The behaviour I've seen is not consistent either as I've also experienced it changing the directory permissions and the read/write/x attributes. Am I going mad or is something very weird going on?

 

Before

-rwxrwxrwx  1 www-data     www-data       3144 Nov 30 11:15 99_5d668bdb4b54de5.icc
-rwxrwxrwx  1 www-data     www-data     192138 Nov 30 11:15 99_5d668bdb4b54de5.jpg
-rwxrwxrwx  1 www-data     www-data       2857 Nov 30 11:15 99col_b196093d59b5da2.jpg
-rwxrwxrwx  1 www-data     www-data      24503 Nov 30 11:15 99pre_f206f68bca78a65.jpg
-rwxrwxrwx  1 www-data     www-data      92464 Nov 30 11:15 99scr_8d542e809620471.jpg
-rwxrwxrwx  1 www-data     www-data       7247 Nov 30 11:15 99thm_e4749084031f557.jpg
-rwxrwxrwx  1 www-data     www-data       2270 Nov 30 12:20 metadump.xml

 

After

-rwxrwxrwx  1 root     root       3144 Nov 30 11:15 99_5d668bdb4b54de5.icc
-rwxrwxrwx  1 root     root     192138 Nov 30 11:15 99_5d668bdb4b54de5.jpg
-rwxrwxrwx  1 root     root       2857 Nov 30 11:15 99col_b196093d59b5da2.jpg
-rwxrwxrwx  1 root     root      24503 Nov 30 11:15 99pre_f206f68bca78a65.jpg
-rwxrwxrwx  1 root     root      92464 Nov 30 11:15 99scr_8d542e809620471.jpg
-rwxrwxrwx  1 root     root       7247 Nov 30 11:15 99thm_e4749084031f557.jpg
-rwxrwxrwx  1 root     root       2270 Nov 30 12:20 metadump.xml

Ok well it's now actually showing my screen dumps so I've pasted the text too.

Liraz Siri's picture

I fixed the ownership issue as well

Note, both issues were related to bugs in the Python standard library when dealing with file moves across file systems. My development environment is a local installation of TKLBAM where /tmp was on the same filesystem as everything else. That's why the problem slipped through in the first place and why it was difficult to reproduce. Finally I admitted I must be missing something and just debugged TKLBAM in a sort of ad-hoc fashion on EC2. Sometimes it's the tiniest differences that getcha! For those interested, here are links to the Python bug tracking system:

http://bugs.python.org/issue9993

http://bugs.python.org/issue1355826

Once I figured out what was going on the fix was straightforward - a wrapper around shutil.move that fixed symlinks and ownership (there don't seem to be any other issues):


# workaround for shutil.move across-filesystem bugs
def move(src, dst):
    st = os.lstat(src)

    is_symlink = stat.S_ISLNK(st.st_mode)

    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(os.path.abspath(src)))

    if is_symlink:
        linkto = os.readlink(src)
        os.symlink(linkto, dst)
        os.unlink(src)
    else:
        shutil.move(src, dst)
        os.lchown(dst, st.st_uid, st.st_gid)


Symlink problem in Apache Tomcat

I'm having the problem of sym links not being backed up/restored in the Apache Tomcat appliance.

/var/lib/tomcat has sym links for logs and work and and those two folders are not getting restored.

Any eta on this issue?

One strange thing I noticed

One strange thing I noticed is that in my master system /var/lib/tomcat6/conf is also a sym link (to /etc/tomcat6). But when I start up a new system from a back up, the work and logs folders are missing (as I said above), but there seem to be a real /var/lib/tomcat6/conf folder with a copy of the contents form /etc/tomcat6.

Any progress?

Any progress on this  (https://bugs.launchpad.net/turnkeylinux/+bug/910515)?

The priroty of the bug is "uncertain". It should be a blocker.

Liraz Siri's picture

Don't worry about the bug

Don't worry about the bug tracker status. It's definitely a blocker and will be fixed in the upcoming next version of TKLBAM.

Thanks for the update!

Thanks for the update!

Any Updates on this? ETA?

I am having the same issue and I am in the process of giving the Hub a run for its money. We want to migrate all of our VM's into the hub but we are struggling with some of our VM's that encounter this issue. 

Wierd thing we can say is this. For wordpress appliances it did not happen. Nor did it happen for a VM running the LAMP applicance with Vanilla Forums installed. 

However, we are having issues with ProjectPier, LAMP Appliance with AtMail and a LAMP Appliance running OpenCart.

Would be great if this was resolved soon!!! 

Keep up the great work - awesome stuff going on here with this!

Revision to my post

The WordPress sites no longer work after a reboot - same goes for the Vanilla Forms LAMP Appliance. So its across the board.

Hope this gets fixed soon - can't wait to use the hub with my current VM's

Liraz Siri's picture

Nailed it! An automatic update will be issued tomorrow...

Sorry this took so long, it turned out to be a bit of an edge case which I couldn't reproduce in my local development environment. That explains why it slipped through all my usual regression tests. Made my life a bit more difficult than usual.

Anyhow, I traced back this issue to an obscure Python bug that effects how shutil.move works when moving symlinks between filesystems:

http://bugs.python.org/issue9993

This only effected tklbam-restore's behavior on small and medium sized EC2 instances since 27/10/2011 when we started mount --bind /tmp to /mnt. Restores to non-EC2 TurnKey installations worked fine. Also, there was never a problem with the backups themselves, just the restore process.

Bottom line: I committed a fix, and we'll push out an automatic update tomorrow so that everyone gets a new version of tklbam automatically. After that restores should work just fine.

Tim Collins's picture

Hooray and well done!

I shall be over the moon if you've cracked this one! As you know the original /tmp problem and the restore symlinks prob have both caused me a lot of headaches. Since you say it only affects the restore would I now be able to fix all my links by going back to an original full backup and restoring all the incrementals? Because I still have many issues with broken links.

Were you able to shed and light on my posts showing problems with permissions being incorrectly restored, this appeared to happen when what should have been a symlink was replaced with the file structure it pointed too during restore.

Excellent!

Here's to hoping that this bug is squashed!

I'm excited to be able to upgrade to the latest and greatest version of TKL... and I will no longer fear having to restore my application!

Thanks so much for your hard work on this.

Occurred on a micro instance

Great to see this is being worked on ... however, the problem I reported here occurred on a micro instance, not a small or medium.

Liraz Siri's picture

Bug on Micro? Maybe. Ownership/Permissions should be solved too

Hmmm.... I guess Micro instances might have been affected too. I was mainly inferring from the fact that it doesn't have temporary storage on /tmp. But I think we may still be mount --bind'ing /tmp on Micro instances to /mnt/tmp in which case it may still have triggered the Python shutil.move moving-a-symlink-across-filesystems bug.

Anyhow, I'm pretty confident I cracked the symlink problem for good and that any related permission/ownership issues will go away as well. As soon as the archive is updated Alon will probably post a comment on this forum post.

Alon Swartz's picture

Package archive updated

The package archive has been updated with the latest version of TKLBAM which fixes the symlink bug, as well as some other issues found during testing. You can install the latest version manually as follows (alternatively it will also be installed automatically with the daily security updates):

apt-get update
apt-get install tklbam
Tim Collins's picture

...but how do I get my original symlinks back?

Now you've fixed the bug I still have to solve the problems caused by all the missing symlinks and all the errors I have left right and center when I install stuff. I know you said the backups are actually ok and it affected the restore. But I have this historic scenario now in event order...

Create and install appliance

Did original backup + some incrementals

Did a Restore, lost the symlinks - they were replaced with files of the same name instead of link

Have done many more full and incrementals since resulting in the backups containing files instead of symlinks.

With me so far? If I now restore the original backup I guess the symlinks will return because bug is fixed but surely when I then apply later backups they will overwrite the symlinks with the files caused by the earlier restore problem.

SO is there a way to tell the restore not to overwrite a symlink with a file of the same name????

I.E. I want to find a way to fully restore my appliance up-to-date with the original symlinks in place.

Jeremy's picture

Yes good point.

And unfortunately I don't have a quick fix, but I'm sure can be done. A couple of ideas that spring to mind are:

  1. restore an old backup that includes symlinks
  2. overwrite with a new backup that includes you desired data only (ie none of the files that should be symlinks)

Or

  1. do a backup of symlinks only (from an old backup that includes them). I'm not sure if you can make TKLBAM just backup certain filetypes, but that'd be cool if you could. Otherwise tar and/or rsync should be able to do that. This will find all symlinks and store their location in a file called /root/symlinks
    find / -type l -exec ls -l {} \; > /root/symlinks
    (don't run 'find /' on a system that has mounted network filesystems eg NFS - otherwise it may cause issues and/or take forever...) You could then compare that against the profile and just tar/rsync the ones that are in backup locations
  2. overwrite your current server with this backup

I know neither of those solutions are pretty and I'm sorry I can't provide you with a nice little script that can do what you want but it should be possible.

Tim Collins's picture

Good idea

Thanks Jeremy good idea, I agree pretty it's not but definitely helpful. I'm still a little nervous about the integrity of my appliance though doing a manual fix like this. Think I'll restore original and incremetals prior to the original restore that caused the symlink problems to a new appliance, from there I'll use the find command you've created then see if I can use that info to fix them by hand on the live appliance. Hope there's not loads of them!

Thanks.

Same issue exists

Has this issue been solved? I just updated everything on my VM's and did a fresh back up and tried to restore to a new Amazon instance and get the same thing. 

Webmin still has no permissions and the restores don't work 100%. My word press appliances work - but the webmin doesnt. Project Pier restores also do not work - makes it seem ilikes its a fresh project pier install. Webmin doesn't work on that one either. 

Also have a Lamp appliance running open cart - which just gives internal server errors - while the VM I have in my data center - works flawlessly.

Ideas? Is there something I can run to get this going? I was totally excited to run on Amazon but this just isn't working.

Jeremy's picture

Should be resolved!?!

So the machine you backed up was all working correctly? In other words, you're sure that it wasn't a backup that already had the symlink/permission problems?

You could also ensure that the new version of TKLBAM is installed on both origin and target (it should be installed via security updates but just to be sure you could 'apt-get update && apt-get install tklbam')

I'll try to test myself sometime soon to see if I can reproduce the issue(s).

Nope!

Yeah I double checked it all. I updated all VM's in my environment with the latest updates for EVERYTHING including TKLBAM. Then once the updates were done - I rebooted each VM. Once rebooted and everything was in working order I started a fresh set of back ups. 

Once the back ups were done - I did a restore from that back up to a new VM and then wamo. Same thing. Some of the VM's in amazon work - like wordpress and lamp running vanilla - however - i still get the access isssue when getting into Webmin. This is consistent across the board for all of them. 

The other thing - is that the lamp appliances running Project Pier, OpenCart and Atmail - don't work at all after I restore. Project Pier thinks its a fresh install. OpenCart just gives internal server errors - can't even get to webmin. And Atmail - well - Atmail we can disregard. That doesn't work period on Amazon no matter what I do. Fresh install or not.

Hope this helps.

Dan Dennedy's picture

works for me

As the original poster, I just want to report this fix is working fine for my use case. I am just using core and running custom, unpackaged software with uninstalled libs. Now the links in libs/ are fine. Also, while I do not use webmin much, it seems to be working, and I no longer see the errors in /var/log/tklbam-restore.

Thank you

Post new comment

The content of this field is kept private and will not be shown publicly. If you have a Gravatar account, used to display your avatar.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <p> <span> <div> <h1> <h2> <h3> <h4> <h5> <h6> <img> <map> <area> <hr> <br> <br /> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <em> <b> <u> <i> <strong> <font> <del> <ins> <sub> <sup> <quote> <blockquote> <pre> <address> <code> <cite> <strike> <caption>

More information about formatting options

Leave this field empty. It's part of a security mechanism.
(Dear spammers: moderators are notified of all new posts. Spam is deleted immediately)