David McNeill's picture

tklbam-backup is giving sts-agent error after running for a couple of hours...

It's got through 571 volumes of 25Mb, so configuration is obviously ok.

Then it gives sts agent error, which is something to do with auth token handling.

Does the generated token for s3:// have a limited life span?  Anyone know how long that is?

What happens if a backup can't complete in that time?

Is there any way to get more detail on that the issue is?


Uploading s3://s3-ap-southeast-2.amazonaws.com/tklbam-ap-southeast-2-e9../duplicity-inc.20170424T015229Z.to.20170424T044749Z.vol571.difftar.gpg to STANDARD Storage
Upload 's3://s3-ap-southeast-2.amazonaws.com/tklbam-ap-southeast-2-e9../duplicity-inc.20170424T015229Z.to.20170424T044749Z.vol571.difftar.gpg' 
failed (attempt #1, reason: Error: sts agent error: )
Traceback (most recent call last):
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1405, in <module>
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1398, in with_tempdir
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1380, in main
  File "/usr/lib/tklbam/deps/bin/duplicity", line 586, in incremental_backup
  File "/usr/lib/tklbam/deps/bin/duplicity", line 412, in write_multivol
    (tdp, dest_filename, vol_num)))
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/duplicity/asyncscheduler.py", line 145, in schedule_task
    return self.__run_synchronously(fn, params)
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/duplicity/asyncscheduler.py", line 172, in __run_synchronously
    ret = fn(*params)
  File "/usr/lib/tklbam/deps/bin/duplicity", line 411, in <lambda>
    async_waiters.append(io_scheduler.schedule_task(lambda tdp, dest_filename, vol_num: put(tdp, dest_filename, vol_num),
  File "/usr/lib/tklbam/deps/bin/duplicity", line 309, in put
    backend.put(tdp, dest_filename)
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/duplicity/backends/_boto_single.py", line 227, in put
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/duplicity/backends/_boto_single.py", line 149, in resetConnection
    is_secure=(not globals.s3_unencrypted_connection))
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/boto/s3/connection.py", line 155, in __init__
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/boto/connection.py", line 476, in __init__
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/boto/provider.py", line 172, in __init__
    self.get_credentials(access_key, secret_key, security_token)
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/boto/provider.py", line 213, in get_credentials
    self.stsagent = stsagent.STSAgent(stsagent_command, 60)
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/boto/stsagent.py", line 67, in __init__
  File "/usr/lib/tklbam/deps/lib/python2.7/dist-packages/boto/stsagent.py", line 72, in renew_credentials
    raise Error("sts agent error: " + output)
Error: sts agent error: 

Traceback (most recent call last):
  File "/usr/bin/tklbam-backup", line 510, in <module>
  File "/usr/bin/tklbam-backup", line 464, in main
  File "/usr/lib/tklbam/duplicity.py", line 268, in __call__
    backup_command.run(target.secret, target.credentials, debug=debug)
  File "/usr/lib/tklbam/duplicity.py", line 114, in run
    raise Error("non-zero exitcode (%d) from backup command: %s" % (exitcode, str(self)))
duplicity.Error: non-zero exitcode (30) from backup command: duplicity --verbosity=5 --volsize=25 --full-if-older-than=1M --gpg-options=--cipher-algo=aes --include=/TKLBAM --include-filelist=/TKLBAM/fsdelta-olist --exclude=** --archive-dir=/var/cache/duplicity --s3-unencrypted-connection --allow-source-mismatch / s3://s3-ap-southeast-2.amazonaws.com/tklbam-ap-southeast-2-e9..

Last inc backup left a partial set, restarting.
Last full backup date: Mon Apr 24 13:52:29 2017
RESTART: Volumes 571 to 571 failed to upload before termination.
         Restarting backup at volume 571.
Downloading s3://s3-ap-southeast-2.amazonaws.com/tklbam-ap-southeast-2-e9../duplicity-inc.20170424T015229Z.to.20170424T044749Z.vol1.difftar.gpg
Restarting after volume 570, file srv/...


David McNeill's picture

Kicked it off again, and it got as far as volume 1404 a couple of hours later.

Site is on 100Mbit fibre, so a good connection, and transit is from Auckland -> Sydney.

Sort of implies tklbam is good for about 20 to 25Gb compressed backups in it's current state.

This does seem to be a change, as larger backups of 6,000 volumes were working last year.


David McNeill's picture

Up to 2141 volumes, next morning, then same sts agent error.


Jeremy Davis's picture

AFAIK the tockens expire after one hour, but TKLBAM should request new tokens as it needs them.

Is this a v14.x server? If so, we have relatively recently released an updated TKLBAM build to make it more robust. Perhaps that will resolve your issue? (Or perhaps it's the cause of your issue?).

So please get the current TKLBAM version like this:

apt-cache policy tklbam

Assuming that you are running at v14.x server and haven't updated to the latest version, it should look something like this:

  Installed: 1.4.1+12+gd34d79b
  Candidate: 1.4.1+17+g71478bd
  Version table:
 *** 1.4.1+17+g71478bd 0
        999 http://archive.turnkeylinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status

If the 'Installed:' version is NOT '1.4.1+17+g71478bd' then please run "apt-get update && apt-get install tklbam" and retry.

If you continue to have these issues after updating to the latest TKLBAM (or you were already using it) then please let me know ASAP and I'll get Liraz to look into it.

David McNeill's picture

Still on wheezy with Univention Memberserver 4.0-3.  Will have to upgrade that circus first.

apt-cache policy tklbam
 Installed: 1.4.1+3+g8005390
 Candidate: 1.4.1+3+g8005390

Jeremy Davis's picture

I'll update Liraz with that info.

Perhaps we should look at backporting the latest version of TKLBAM to v13.x too?

Add new comment