Ronan0's picture

I have a turnkey instance a few years old (vTiger).

It has been backing up daily and has ~1500 backup sessions.

(An aside is I recently reset to max 100 sessions but it has not cleared the excess sessions still kept.)

Lately backups appear to have been failing. I checked it a month ago and a couple of days had been missed in a row.

Then recently I checked in and a whole month had been missed. 

I manually did a back up and it resumed a failed session.

It backed that session up successfully, but the auto back-ups still seem to be failing, though set. One auto back-up has completed and now it is back to skipping sessoins, obviously failing.

Any advice? I am going to try a backup with --disable-resume option set next.

Feedback appreciated. Thanks.

Forum: 
Tags: 
Jeremy Davis's picture

Firstly, WRT to "max backups" that refers to full backups. Whereas the "sessions" include incremental backups as well. So assuming you are using the default of monthly full backups and daily incrementals, then keeping 100 full backups will be 100 months worth of backups! That's just over 8 years worth of backups! So that'd be why none of the old ones have been removed yet!

Regarding your failing backups, we need to get a better understanding of why it's failing. If you could keep an eye on it and the next time you notice that it's failed, please copy/download the tklbam-backup.log file from /var/log. Either attach it to your original post (only the first post of each thread can have attachments) or just post the text in a new comment.

Ronan0's picture

Hi.

Thanks for feedback. 

I have attached the log file to the OP.

Thanks.

 

Jeremy Davis's picture

That's great thanks.

I had a quick google and I came across a duplicity bug (what TKLBAM uses in the backend) that appears may be relevant, but I'll forward your log to Liraz as he's the TKLBAM dev.

In the meantime, I also note that most of the failed backups are failed resumes (i.e. the backup failed and the next time it tried to resume from where it was up to). However, it seems that the most recent one was a full backup attempt and it too failed.

Have you manually tried running a fresh full backup? You can try that like this:

tklbam-backup --disable-resume --full-backup 1D

I'll get back to you ASAP once I hear back from Liraz. Please feel free to bump your thread if you hven't heard and would like an update.

Ronan0's picture

Hi. Any update on this? Thanks very much.

Jeremy Davis's picture

Unfortunately, I still haven't heard back from Liraz. He's deep in "dev mode" and often goes offline for extended periods. If I don't hear back from him really soon, I will call him to ensure that he looks into it ASAP.

Have you tried manually running a full backup as I suggested in my last post?

Ronan0's picture

Thanks Jeremy. Yes, I tried running a full backup manually and disabling resume.

Just tried again with your command. Still get error log ending:

 

Traceback (most recent call last):
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1405, in <module>
    with_tempdir(main)
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1398, in with_tempdir
    fn()
  File "/usr/lib/tklbam/deps/bin/duplicity", line 1278, in main
    globals.archive_dir).set_values()
  File "/usr/lib/tklbam/deps/lib/python2.6/dist-packages/duplicity/collections.py", line 691, in set_values
    self.get_backup_chains(partials + backend_filename_list)
  File "/usr/lib/tklbam/deps/lib/python2.6/dist-packages/duplicity/collections.py", line 814, in get_backup_chains
    map(add_to_sets, filename_list)
  File "/usr/lib/tklbam/deps/lib/python2.6/dist-packages/duplicity/collections.py", line 804, in add_to_sets
    if set.add_filename(filename):
  File "/usr/lib/tklbam/deps/lib/python2.6/dist-packages/duplicity/collections.py", line 93, in add_filename
    self.set_manifest(filename)
  File "/usr/lib/tklbam/deps/lib/python2.6/dist-packages/duplicity/collections.py", line 124, in set_manifest
    remote_filename)
AssertionError: ('duplicity-inc.20161220T071548Z.to.20161222T071607Z.manifest.part', u'duplicity-inc.20161220T071548Z.to.20161222T071607Z.manifest.gpg')

Traceback (most recent call last):
  File "/usr/bin/tklbam-backup", line 510, in <module>
    main()
  File "/usr/bin/tklbam-backup", line 464, in main
    log=_print)
  File "/usr/lib/tklbam/duplicity.py", line 235, in __call__
    cleanup_command.run(target.secret, target.credentials)
  File "/usr/lib/tklbam/duplicity.py", line 114, in run
    raise Error("non-zero exitcode (%d) from backup command: %s" % (exitcode, str(self)))


duplicity.Error: non-zero exitcode (30) from backup command: duplicity --verbosity=5 --archive-dir=/var/cache/duplicity cleanup --force s3://s3.amazonaws.com/tklbam-us-east-1-67de5cff790f5ed2075d/7ee23lhbnjm*****

Ronan0's picture

I looked into the duplicity cache folder indicated in that error.

There were a lot of files there, so I archived them and created a fresh cache folder.

That has seemed to resolve the problem. Hopefully the daily automated backups will now resume as normal. I expect they will.

I'll revert and confirm this. Thanks again for your attention on this.

 

Jeremy Davis's picture

Thanks for posting back with your fix (which is sounds like it is). If the issue reoccurs, please let me know ASAP.

Actually I've opened an issue on our tracker. Hopefully we can improve TKLBAM in the future to deal with this issue automatically if it occurs.

I'll also directly update Liraz on the situation.

Thanks again.

Ronan0's picture

I just checked in this morning to the hub to see if the incremental backups were being done on the vTiger system.

None of my 3 production servers completed their daily backup over the weekend!

The other two were working fine until Friday.

Has anything changed on your end?

All connectivity to AWS established.

I won't be able to do any more investigation until later.

Thanks.

Jeremy Davis's picture

Do the logs say they worked?

If so, you can trust them. The issue is probably just that the Hub's cache is probably a bit slow to refresh. Or possibly something has "got stuck" in the back end so the dashboard data hasn't been updated. I will pass this info on to Alon (who is the lead Hub dev) and get him to make sure nothing has gone wrong with the data caching. (Note it's the info data, not your backup data; your backup data never touches the Hub).

If the logs are showing that it crashed, then please post the new entries and we'll look into it further.

Either way, I have found your Hub account and have noted it to Alon to double check all is well behind the scenes.

Jeremy Davis's picture

Thanks again for reporting that your latest backups weren't showing. Alon investigated and it turns out that one of the Hub's backend components had crashed over the weekend (on Saturday) causing stale data to be displayed in the Hub.

Alon restarted the service about 8 hours ago and it should have finished processing the backlog. Your latest backups should now be showing. Alon is investigating the cause of the issue.

Please double check that all is well within the backups area of the Hub and please do not hesitate to let us know if anything else is not working as you think it should.

Ronan0's picture

Everything working well now, thank you very much.

Add new comment