You are here
Hi Guys,
Have successfully been using Turnkey Fileserver Appliance now for over 12 months to fileserve and back up system on daily basis. Bare-metal install of Lucid appliance.
Tklbam settings are default ( full backup monthly , inc in between , and 50 M volsize)
Each night I do a local tklbam-backup to a local machine and to the remote amazon hub.
Recently backup size has got to between 12 -13 G uncompressed data footprint size. and I have noticed some remote backup failures, which I put down to internet dropouts.....
I duplicated my install on another machine to test a full back up , and while the inital backup of the system succeeded (it was only 1 MB) without all of my data, the next backup including my data fails 100% of the time, and the original machine also now fails 100 % because it has got to full backup stage as well.
What appears to be happening is that after doing the initial preparation the backup starts uploading volumes , but stalls after a certain number. I have repeated this on both my original machine and the duplicate machine with the same result.
The screen output hangs at the "Uploading ..... etc "
Analysing the output of 'ps aux' command indicates tklbam-backup still running in SL+ mode i.e in interuptable sleep mode, waiting for an event to happen.
'Netstat' shows the amazon socket in "CLOSE_WAIT" status indicating that it has closed at its end and is waiting for my socket to close.
After <Control-C> to kill tklbam-backup I get the following traceback ...
Uploading s3://s3-ap-southeast-1.amazonaws.com/tklbam-yfcamybrzeeisvc4/duplicity-inc.20110825T035002Z.to.20110829T061239Z.vol49.
difftar.gpg to STANDARD Storage
Processed volume 49
Uploading s3://s3-ap-southeast-1.amazonaws.com/tklbam-yfcamybrzeeisvc4/duplicity-inc.20110825T035002Z.to.20110829T061239Z.vol50.
difftar.gpg to STANDARD Storage
^CTraceback (most recent call last):
File "/usr/bin/tklbam-backup", line 266, in <module>
main()
File "/usr/bin/tklbam-backup", line 239, in main
trap = UnitedStdTrap(transparent=True)
File "/usr/lib/python2.6/dist-packages/stdtrap.py", line 266, in __init__
self.stdout_splice = self.Splicer(sys.stdout.fileno(), usepty, transparent)
File "/usr/lib/python2.6/dist-packages/stdtrap.py", line 213, in __init__
vals = self._splice(spliced_fd, usepty, transparent)
File "/usr/lib/python2.6/dist-packages/stdtrap.py", line 175, in _splice
events = poll.poll()
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/bin/tklbam-backup", line 266, in <module>
main()
File "/usr/bin/tklbam-backup", line 242, in main
b.run()
File "/usr/lib/tklbam/backup.py", line 311, in run
backup_command.run(passphrase, conf.credentials)
File "/usr/lib/tklbam/duplicity.py", line 77, in run
exitcode = child.wait()
File "/usr/lib/python2.6/subprocess.py", line 1170, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.6/subprocess.py", line 465, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
Dont know really what any of that means but hoping the devs do.
Anyway , to me it appears that the amazon end is timing out , or in some indefinite loop waiting for something at my end to happen and then gives up and closes, while tklbam continues to run on my machine.
There is no other problem at my location to indicate internet connection issues. I can transfer large files in other scenarios with no problems, and up to now tklbam has been working for me with no issues.
The log file is of no help as it just mirrors the screen output. I tried redirecting stderr to stdout but nothing , so that coupled with the fact that tklbam-backup is still running indicates to me that no error has actually occurred in the program.
A by-product of this is that I now have 1 backup showing on my hub dashboard that still indicates "First backup in progress" and there is no way to delete it.
I would appreciate any ideas from devs/others as to where and why my problems are now occuring, hopefully the callback trace helps.
Thanks in anticipation and once again sorry about the length of the post.
Scott H.
Hey Scott
Glad to hear that things have been good up until now. Not so great to hear about your issues noe though.
Sorry I don't have anything at all to add. Anything I may have suggested seems to have been ruled out by your tests...
Regardless I can suggest that if you haven't already, that you use the hub feedback feature to at least get the devs to get your incomplete backup deleted. I'd be inclined to mention this thread (and post a link) in your feedback and hopefully they may have some ideas.
Same Here
I have the same issues but with larger file sizes.
I have given up on using TKLBAM for fileservers for now.
Chris Musty
Director
Specialised Technologies
That sounded harsh
I didnt mean to sound so scathing - I love TKL! I was just stating that I have not deployed it for file servers for a while :)
Chris Musty
Director
Specialised Technologies
This may be a Duplicity bug...
Happy to help
I will send you admin access to my affected server if you wish, where should I send the details?
Chris Musty
Director
Specialised Technologies
Replied to Hub feedback
I replied to the Hub feedback you sent with the relevant info.
Thanks Liraz, Chris
Thanks for replying guys, Alon has also contacted me regarding this. I have a turnkey - core server running on amazon with openvpn, upon which I am going to install samba and try to do a "backup" file transfer (a tarball of my data = ~ 7.5 G) this way to see if it will go through to completion with the file size, and Ill report back. Naturally I'll be happy to be a guinea pig for any tklbam testing that you need to do to resolve this issue as I'd really prefer to keep doing things the way Ive previously been.
Thanks again
Scott H.
What version of TKLBAM are you using?
The latest/current version is v1.4
This issue should have been resolved some time ago (hence the lack of activity on this thread...). We updated our forked version of duplicity to resolve this and a few other bugs.
TKLBAM is Liraz's baby so he'd be able to give more detail but AFAIK the data is broken up into chunks (default 50MB IIRC) and uploaded like that so it should be irrelevant how big the actual files are (although obviously with a lot of data there is more chance that you will have the odd chunk fail).
As we have literally thousands of TKLBAM users and you are currently the only user having this issue (that I am aware of) my suspicion is that it is something to do with your network/internet connection, although I'm only guessing... Perhaps there is some other edge case scenario going on here?
Regardless if you could post more of your TKLBAM logs and I'll get Liraz to have a quick glance.
To fix an issue we have to be able to reproduce it first
In any case, the first step in fixing a bug is reproducing it reliably. The harder it is to reproduce a problem the harder it is to track down and fix.
If the problem has something to do with your network configuration (e.g., a misbehaving router/proxy) then that's going to be hard for me to reproduce and it might not even be something that I can fix within TKLBAM. Sometimes there are workarounds for these issues, or you can add more redundancy, sometimes not.
For what it's worth, TKBLAM uses Duplicity as the storage backend. Duplicity is fairly well tested and like Jeremy mentioned it breaks down big files into volumes. You can configure the volume size and that might help I guess. Take a look at /etc/tklbam/conf if you want to try that.
Also, if you want the backups but don't want to use Duplicity, you can dump the raw backups to the local filesystem using the --dump option and then use whatever method works best under your circumstances to stash it safely somewhere. Though I do recommend incremental backups over just keeping dumb copies.
Finally, you can ask Duplicity to store your data on other backends. You don't have to use AWS S3 if it doesn't work well for you.
Add new comment