One of our servers has over 7GBs of backup data, mostly due to content (images, videos, etc) loaded into the content management system. When attempting to launch a new VM from a backup of this system with hub-launch, the restore seems to be running smoothly, but eventually stalls. After manually running the restore from the command-line, it was clear why the hub-launch was just stalling - the root '/' partition ran out of space.

Restoring duplicity archive from s3://s3-us-west-1.amazonaws.com/tklbam-gkp7dhx45incclo2
Synchronizing remote metadata to local cache...
Copying duplicity-full-signatures.20110914T195922Z.sigtar to local cache.
Copying duplicity-full.20110914T195922Z.manifest to local cache.
Last full backup date: Wed Sep 14 19:59:22 2011
Traceback (most recent call last):
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1252, in <module>
    with_tempdir(main)
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1245, in with_tempdir
    fn()
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1199, in main
    restore(col_stats)
  File "/usr/lib/tklbam/deps/bin/duplicity", line 539, in restore
    restore_get_patched_rop_iter(col_stats)):
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/patchdir.py", line 522, in Write_ROPaths
    ITR( ropath.index, ropath )
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/lazy.py", line 335, in __call__
    last_branch.fast_process, args)
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/robust.py", line 37, in check_common_error
    return function(*args)
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/patchdir.py", line 575, in fast_process
    ropath.copy( self.base_path.new_index( index ) )
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/path.py", line 416, in copy
    other.writefileobj(self.open("rb"))
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/path.py", line 595, in writefileobj
    fout.write(buf)

IOError: [Errno 28] No space left on device
Apparently duplicity downloads and unpacks the backup to the /tmp directory, which happens to live on the root partition (the way TKL Ubuntu Server is configured). 

I attempted a quick-and-dirty solution (or so I thought) by logging into the newly launched server and running the following commands:

	mkdir /mnt/tmp
chmod 777 /mnt/tmp
chmod +t /mnt/tmp
rm -rf /tmp
ln -s /mnt/tmp /tmp
tklbam-restore [backup id]

but I ran into the same problem.

I am hoping that there is a solution that would allow me to continue to use hub-launch without having to perform the restore manually, but would live with a quick-and-dirty manual solution that actually works.

Thanks!

-Ken

Forum: 
Tags: 

I tried setting the TMPDIR environment variable per the Duplicy FAQ: http://duplicity.nongnu.org/FAQ.html, to:

/mnt/duplicity/tmp

since the /mnt partition is 335GBs and still no luck. While Duplicity appeared to be using /mnt/duplicity/tmp (9.6GB used), the restore failed with "[Errno 28] No space left on device". For what it's worth, here is what the file system disk space usage looks like after the error:

root@lamp ~# df -h 
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.9G  9.9G     0 100% /
none                  844M  120K  844M   1% /dev
none                  881M     0  881M   0% /dev/shm
none                  881M   68K  880M   1% /var/run
none                  881M     0  881M   0% /var/lock
none                  881M     0  881M   0% /lib/init/rw
/dev/sda2             335G  9.6G  309G   3% /mnt
By the way, the file system for the server being restored looks like this:
	root@nwtp ~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.9G  3.5G  5.9G  37% /
none                  844M  120K  844M   1% /dev
none                  881M     0  881M   0% /dev/shm
none                  881M   64K  880M   1% /var/run
none                  881M     0  881M   0% /var/lock
none                  881M     0  881M   0% /lib/init/rw
/dev/sda2             335G  9.6G  309G   3% /mnt
So it's using /mnt/nwtp (a directory I created and configured the CMS to use) to store it's content and I have added /mnt/nwtp to /etc/tklbam/overrides. So, on the target system, in addition to using /mnt as the TMPDIR for Duplicity, I am expecting the restore to overlay the backed-up content to /mnt/nwtp - and it did, but also still filled up the /tmp directory as well. 
 
While the amount of data (content stored) will eventually grow well beyond 7GBs (but not for along while), I was hoping to get away with using tklbam-backup and hub-launch as a simple automated solution despite the amount of data. Maybe I should consider a different solution now for this particular server like EBS (and EBS snapshots)? Maybe tklbam-backup was not designed to handle this much data?  

Liraz Siri's picture

Thanks for reporting this and taking a stab at finding out what went wrong. That's very useful. I think you've zeroed in on the real issue here - that /tmp on EC2 should be redirected to temporary storage (/mnt) when it exists. I'll talk to Alon about fixing this.

I haven't seen anything to indicate that Duplicity has difficulty handling restoration of big backups so I don't think it's likely this has anything to do with an inherent limitation. By design, TKLBAM should be able to backup/restore arbitrarily large amounts of data.

Obviously if you run out of disk space that's going to be an issue regardless of what backup/restore method you are using. There's no magic involved.

Anyhow, I've just taken a look at the TKLBAM source code and confirmed that /tmp isn't hardwired anywhere. Maybe you just didn't set TMPDIR right? You need to export the variable if you want your shell to set it not just in the local environment but also in the environment of the programs it executes. Like this:


# don't just set TMPDIR, export it
export TMPDIR=/mnt

tklbam-restore ...

If that still doesn't work, try redirecting /tmp to /mnt like this:
mkdir /mnt/tmp
mount --bind /mnt/tmp /tmp
Please tell me if any of these workarounds help. We'll look into solving the underlying EC2 configuration issue...

So I decided to go with:

mkdir /mnt/tmp
mount --bind /mnt/tmp /tmp

right before I do a large restore and it appears to be working. All files get downloaded and cached by Duplicity to the /tmp directory on the /mnt partition and the root / partition grows as expected (new packages are installed), but does not run out of space.  

Thanks for the help!


Liraz Siri's picture

OK, the above manual fix is no longer required as the Hub now does this for you.
John Kimathi's picture

A heads up - I have tried everything mentioned above, to do arestore no joy at all. My situation dire because my PRODUCTION sever crashed. Any one with ideas on how I can do a tklbam  restore  without generating the error below?

 

 File "/usr/bin/tklbam-restore", line 444, in main

   log_fh.write(trap.std.read())

IOError: [Errno 28] No space left on device

Jeremy Davis's picture

If so then perhaps you are area out of physical space?

John Kimathi's picture

I was able to resolve this problem using the these commands in the following order. This was after googling profusely for two days.

mkdir /temp

chmod 1777 /temp/

export TMPDIR=/temp/

mount --bind  /temp/ /tmp/

mount --bind  /temp /tmp

tklbam-restore 23 --time 2015-09-20T07:23:16

Add new comment