Scott Howard's picture

Hey Guys,

Using TKL Fileserver Appliance (Lucid Release), backing up to Amazon and locally each night via TKLBAM with no issues. Have 2 back up volumes, decided to trial restoring one to a cloud server for testing.

Clicked on link on Backup Managment Page, and the default settings on the next pop up box and ....

24 HOURS LATER  it is still trying to start up !

The last full back up size was 4.2GB.

The server page shows the instance and the "restoring backup" message in the status area.

This doesnt seem right given that to do a full back up over our ADSL2+ connection takes ~ 6-7 hours at night.

I would have thought that transferring the same data within the Amazon EC2 servers would be very quick by comparison.

It appears to have gone through the "installing security updates" phase.

The last line of the console output is :

hvc0: Unpacking replacement libk5crypto3 ...
 

I will leave it run a bit longer, but probably cancel it soon if it isnt up.

 

Has anybody successfully launched a Fileserver appliance back up with similar data size and had success ?

 

Thanks guys.

 

Scott H.

Forum: 
Alon Swartz's picture

TKLBAM restore probably hit an issue and raised an exception, which means that it won't send a status update to the Hub that the restore completed successfully, hense the Hub won't update the status field.

I'd log into the server (via ssh) and take a look at /var/log/tklbam-restore to see what happened.

If you can't access the system, another way to test this is to launch a new fileserver image, log in and perform the restore manually from the command line (tklbam-restore BACKUPID) and see where it fails.

BTW, network speed is very fast within the Amazon cloud. Just make sure you launch the server in the same region your backup is stored for increased speed.

Scott Howard's picture

Hi Alon , thanks for replying.

Couldnt log into the running instance eventhough all services appeared to be running (SSH / Webshell), as I kept getting 'Access Denied' or incorrect password responses, even though i'm using my correct passwords from the backed up appliance. Decided to reboot the instance, checked the console output, and it appears as though all has gone thru, this is the last lines of the console output :

 * Starting Initialization hooks        [9;0][ OK ]
 * Starting NTP server ntpd        [ OK ]
 * Starting Shell In A Box Daemon shellinabox        [ OK ]
Syntax OK
 * Starting web server lighttpd        [ OK ]
 * Starting webmin        [ OK ]
Successfully updated Hub with server boot status: tklbam-restore
Restoring duplicity archive from s3://s3-ap-southeast-1.amazonaws.com/tklbam-wgodhopwt362mxgl
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Wed Apr  6 08:31:58 2011

Checked on the Amazon EC2 dashboard , confirmed runnign instance and correct IP address, but still cannot login .. passwords rejected... and guess what , despite the line which says "Successfully updated Hub...."  the Servers Dashboard page still persists with the "restoring backup" status message as if nothing had happened.

Am I missing something with the passwords?

I'm going to terminate this instance and try again.

Scott Howard's picture

Tried again as above. Same issue -- stuck on restoring back up -- dashboard page endlessly displays "restoring backup".

Trawled through the console output and managed to find the root password displayed in the console output in between highlighted area where RSA fingerprints are generated (this seems a bit insecure).

Anyway logged into the system with this password and looked for the /var/log/tklbam-restore .. doesnt exist. Also no evidence of any of my backed up files.

Went back to console output and went to section where Duplicity is doing ist stuff to restore backup and found....

Copying duplicity-new-signatures.20110421T083149Z.to.20110422T083053Z.sigtar to local cache.

 

Copying duplicity-new-signatures.20110422T083053Z.to.20110423T083055Z.sigtar to local cache.
Last full backup date: Wed Apr  6 08:31:58 2011
Traceback (most recent call last):
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1252, in <module>
    with_tempdir(main)
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1245, in with_tempdir
    fn()
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1199, in main
    restore(col_stats)
  File "/usr/lib/tklbam/deps/bin/duplicity", line 539, in restore
    restore_get_patched_rop_iter(col_stats)):
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/patchdir.py", line 522, in Write_ROPaths
    ITR( ropath.index, ropath )
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/lazy.py", line 335, in __call__
    last_branch.fast_process, args)
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/robust.py", line 37, in check_common_error
    return function(*args)
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/patchdir.py", line 575, in fast_process
    ropath.copy( self.base_path.new_index( index ) )
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/path.py", line 416, in copy
    other.writefileobj(self.open("rb"))
  File "/usr/lib/tklbam/deps/lib/python2.6/site-packages/duplicity/path.py", line 595, in writefileobj
    fout.write(buf)
IOError: [Errno 28] No space left on device

 

So I am interpreting that my cloud device does not have enough space to restore my back up !

I selected the default machine when staring this ... I'll try with the next size up.

Shouldn't the hub be clever enough to know this already when launching a stored backup?

Regards Scott H.

Scott Howard's picture

Repeated the above with the  ? larger cloud device.

Got the same issue again .... no room left on device.

By the way , the console pop up window buffer obviously does not have enough space to contain the entire output all at once,  the error message from the original boot appears after the next boot as it must be partially flushed through by the next set of output coming from the next boot...(sorry if that sounds confusing)

Should I try attaching an EBS volume after booting (after fetching the root password from the console output....) and then auto mounting it at the restore point for the next boot ?

Hey guys I thought this was supposed to be easy...

Maybe I cant have my file server in the cloud ?

Jeremy Davis's picture

But obviously still a few teething problems in some circumstance (like yours). I'd try manually doing it (ie boot machine with TKLBAM, then run TKLBAM restore) just to completely confirm that space is the issue. Then when you've done that you could try mounting an EBS volume and manually running TKLBAM then.

Scott Howard's picture

Hi Jeremy,

Imgoing to try to start a bare Fileserver appliance, mount a EBS Volume at my backup files restore location and manually run tklbam restore -- I presume you can point it at a specific volume , which I think is what youre saying. Ill report back on the outcome.

Scott H.

Alon Swartz's picture

As per your previous comments the issue is as you suspect, that the server doesn't have enough space to restore your backup. You will need to create an EBS volume big enough to store your backup, and mount --bind it to the location where your data will be restored to before issuing tklbam-restore.

It's an interesting idea to check the size of the backup when using restore-to-cloud-server and display a notification. I've added the idea to my todo list and will look into the best way of doing that.

Scott Howard's picture

Thanks once again for replying Alon.

A few things on this area that I have discovered in my eventually successful effort to restore a Fileserver Backup to a Cloud Server :

1. TKLBAM-restore does not produce any log output until the program has finished, which makes it of no use when an error occurs and it hangs.

2. Doing a df -h on a new bare Fileserver Cloud appliance shows that the 320G of temp storage on the cloud server is a device mounted at /mnt.

The problem with this is that if your restored files are not stored off this point than you will run out of space restoring all but the smallest of backups. (In my case the files are stored at /home)

3. Possibly most importantly , dupliciity / tklbam-restore appears to initially restore the files to the /tmp directory and then moves them to the restore point .

At this point it displays a console message about Last full backup Date and then there is a delay (in my case ~ 30 - 40 mins for a 4.2G backup ) until all the restore paths messages are displayed.  The problem with this is again that there is not enough space at /tmp in a bare cloud appliance to do this ....  I had to create 2 EBS volumes , 1 mounted at /tmp  and one mounted at /home (my restore point) to successfully get the files to their proper place. 

After all this all appreared well except the Webshell access wouldnt work, dont know why, Webmin and SSH still worked.  Now all I have to to do is work out how to access Samba shares over the net from my WinXP stations (? VPN tunnel)  It's all good fun.

P.S. Im about to try this all again by doing mount --bind /mnt /tmp   and mount --bind /mnt /home instead of creating any EBS volumes  -- hopefully the 320 G will then by large enough and accessible in the right places.....

******P.P.S  This worked !  But Webshell is still broken after the restore ... No EBS needed*****

******P.P.P.S Found out while Shellinabox woudnt work .. for some reason the permisions on /etc/ssl/certs/cert.pem were changed after the restore ... chmod 777 fixed this ******

Alon Swartz's picture

Very good points on tklbam-restore log and /tmp. I sent a mail to Liraz (main developer of TKLBAM) to get his feedback, and we'll hopefully see some tweaks regarding these issues in the next release.

Please note that the tmpfs is ephemeral storage, and when you terminate the instance all data stored there will be lost. You might want to use ebs for /home and just mount bind /mnt/tklbam-tmp to /tmp during the restore.

chmod'ing cert.pem to 777 is insecure. It should be 600, or 640 with group ownership to certssl (IIRC). Did you change permissions on the original system that was backed up? This is the first time I've heard of such a bug. Can you verify the issue / reproduce it?

Scott Howard's picture

Hi again,

Yes I realise your point regarding not using an EBS volume ,  just wanted to see if I could use the existing storage capacity on the instance for the restore. Backing up the data using tklbam achieves persistence, but doing a full restore each time when restarting the instance takes alot longer than attaching an EBS volume to a running instance. Is the cost of storage space for a backup  vs EBS volumes per gigabyte the same? Also doing mount --bind as I did means that this has to be done on every reboot otherwise there is no data at the original restore point ... only at /mnt, which tklbam doesnt back up by default (realised this the first time I backed up this cloud server and saw a very small back up !!!) I put the mount --bind  /mnt  /home  in my rc.local , I presume this will work.

In regards to cert.pem and its permissions, I'm pretty sure I didnt change it on my original system, cant think why I would. I'll use your permissions and group ownership and restart shellinabox and see how it goes.

When I stop this server and have to restore it again I guess I'll see if the same thing happens... I'll let you know.

Regards,

Scott H.

Add new comment