Scott Buffington's picture

I am trying to migrate my data from one Turnkey Linux virtural machine to another, much like the demonstration video.  I made a backup of both my Turnkey Linux virtual machines, the backup feature works fine, the initial backup for the primary machine is 2GB in size.  When I try to restore this backup on the new virtual machine (backing up on the new virtual machine via tklbam works fine) I get the following errors.

 

root@lamp:/usr/lib/tklbam# tklbam-restore 1
Restoring duplicity archive from s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Wed Oct 13 18:29:23 2010
Download s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg failed (attempt #1, reason: sslerror: The read operation timed out)
Download s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg failed (attempt #2, reason: sslerror: The read operation timed out)
Download s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg failed (attempt #3, reason: sslerror: The read operation timed out)
Download s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg failed (attempt #4, reason: sslerror: The read operation timed out)
Download s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg failed (attempt #5, reason: error: (104, 'Connection reset by peer'))
Giving up trying to download s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg after 5 attempts
BackendException: Error downloading s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss/duplicity-full.20101013T182923Z.vol16.difftar.gpg
Traceback (most recent call last):
  File "/usr/bin/tklbam-restore", line 287, in <module>
    main()
  File "/usr/bin/tklbam-restore", line 266, in main
    rollback=not no_rollback)
  File "/usr/lib/tklbam/restore.py", line 68, in __init__
    backup_archive = self._duplicity_restore(address, credentials, secret, time)
  File "/usr/lib/tklbam/restore.py", line 61, in _duplicity_restore
    duplicity.Command(opts, address, tmpdir).run(secret, credentials)
  File "/usr/lib/tklbam/duplicity.py", line 79, in run
    raise Error("non-zero exitcode (%d) from backup command: %s" % (exitcode, str(self)))
duplicity.Error: non-zero exitcode (23) from backup command: duplicity --archive-dir=/var/cache/duplicity s3://s3.amazonaws.com/tklbam-t2vavueuk6x2nkss /tmp/tklbam-1ZyCss
 
I do not know if it matters, but one of the virtual machines is running on VirtualBox and the new virtual machine I am trying to restore on is on VMware Fusion.  Thank you.
Forum: 
Tags: 
Alon Swartz's picture

Seems like a known issue [1,2] in duplicity (the backend used by TKLBAM). The bug report references boto, the s3 backend used by duplicity, but I'm not sure - we'll have to do some investigation and testing.

I sent a note to Liraz (main author of TKLBAM) to see if he can reproduce the bug, and then issue a fix.

Do you constantly get the same error? Can you reproduce the issue on-demand?

Scott Buffington's picture

Yes I get the error every time I run the restore.  Thank you.

Liraz Siri's picture

Thanks for reporting this. I've never come across this issue so I'm guessing but I don't think this is a bug in TKLBAM, Duplicity or even in python-boto. It looks more like a networking problem. Something (e.g., a firewall on your Intranet) is disrupting your connection to Amazon S3. You can try running a sniffer (e.g., tcpdump/wireshark) to confirm or rule that out.

You may need to go through a proxy server to access Amazon. Ask your network administrator about that.

If that's the case you'll need to configure the software to route traffic through the local. You can do that by setting the http_proxy environment variable like this:

# e.g., http://10.0.0.168:8080/
export http_proxy=http://proxy.address.here:PORT/

tklbam-restore 1
Try that and report back so if anyone else runs into this we can solve it for them as well.

If it doesn't work, and you just want to migrate a machine you can always bypass Amazon S3 and backup to the local filesystem/NFS/FTP server using the "--address" option. See the documentation for details.

@Alon: we already have the retry patch discussed in the Launchpad bug report in tklbam-duplicity.

Scott Buffington's picture

Here is why I do not think it is a network issue, but I will have a look.  I can backup the newly created machine and restore it's backup.  For example, I issue:

root@lamp:/usr/lib/tklbam# tklbam-backup

And it is successful

 

--------------[ Backup Statistics ]--------------
StartTime 1287499188.11 (Tue Oct 19 14:39:48 2010)
EndTime 1287499188.49 (Tue Oct 19 14:39:48 2010)
ElapsedTime 0.38 (0.38 seconds)
SourceFiles 256
SourceFileSize 1883983 (1.80 MB)
NewFiles 89
NewFileSize 361048 (353 KB)
DeletedFiles 0
ChangedFiles 58
ChangedFileSize 513727 (502 KB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 147
RawDeltaSize 3389 (3.31 KB)
TotalDestinationSizeChange 5082 (4.96 KB)
Errors 0
-------------------------------------------------
 
Then I issue the restore:
 
root@lamp:/usr/lib/tklbam# tklbam-restore 2
Restoring duplicity archive from s3://s3.amazonaws.com/tklbam-opmdspw7ttxuu6za
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Oct 19 11:32:08 2010
 
Restoring new packages
======================
 
apt-get update
--------------
 
Hit http://archive.turnkeylinux.org hardy-security Release.gpg
Hit http://archive.turnkeylinux.org hardy Release.gpg
Hit http://archive.turnkeylinux.org hardy-security Release
 
Everything comes back with success.  When I try to restore backup 1 I get the timeout again.
Liraz Siri's picture

The trick in getting to the bottom of an issue is to isolate it. This is a process where you gradually circle in closer and closer to what is triggering the bug and try not to let your preconceptions get in the way. It could be that your network only interferes with long-lived SSL connections, etc. It could be a rare Amazon S3 bug.

The main evidence for my hypothesis that it's a problem with your network is that we've never seen this happen before. So exactly the same software stack works elsewhere and doesn't work for you. What's different?

A couple of ideas:

1) Try restoring backup #1 from an Amazon EC2 instance you launch via the Hub. If that doesn't work then excellent because then we can easily reproduce the problem by using your backup as a test case (with your permission of course).

2) Try re-initializing tklbam to create a new backup:

tklbam-init --force YOUR-API-KEY
tklbam-backup
If the problem persists with a new backup it's unlikely to be a rare bug on Amazon's end.

3) Edit /etc/tklbam/conf to reduce the volume size (e.g., to 1MB) and try again. If long-lived connections on your network are the problem maybe this will get around it.

Scott Buffington's picture

Thanks guys, I am working through the solutions you provided.  I appreciate all the feedback.

Liraz Siri's picture

You never got back to us on how you faired with those workarounds. Did you get to the bottom of the problem?
David Western's picture

I encountered this same issue restoring from S3. I changed the volume size in the conf file as Liraz suggested, rebooted, and then it worked. Before that change it failed on 4 different attempts, each time on different files.

Add new comment