Afternoon, this morning I noticed an issue with TKLBAM backups when I log into https://hub.turnkeylinux.org/. I have been using the service for almost 2 years with minimal issues, backing up 9 VMs running in Virtualbox. This morning I noticed that the status indicated no succesful backups of any VM for 2+ days for all 9 VMs.  I became worried so logged into Webmin for each VM.  Each VM indicates that they backedup successfully that morning (less than 6 hrs earlier).

I'm wondering whether there is an issue with the webportal for https://hub.turnkeylinux.org/backups/ at this time, is the status not updating properly? 

Any help you can provide to fault-find this error is appreciated, I have made no changes to my internal network or firewall that would explain this error. I have performed several test backups (from VM webmin gui, all indicate a successful backup was completed), but none-the-less the website doesn't reflect this backup. I have rebooted each VM, and reattempted, with no change to the web portal. 

 

 

 

 

Forum: 
Tags: 

This morning I decided to do more fault-finding.  The problem persists.  All of my local VMs which have successfully used TKLBAM for the past 2 years no longer indicate a successful "backup" on the hub webpage, despite reporting success on their webmin pages.  

To narrow down the problem, I decided to launch a cloud instance this morning of a new VM, activated TKLBAM, and perform a backup.  Again the backup on the Webmin indicated a successful backup, but the hub backup page does not show a successful backup.  Even more it doesn't even list the ID of the VM I launched on the cloud.

I am fairly convinced at this point that the issue is with the Hub webpage not the VMs as I have confirmed the issue with 2 sources- my local VMs, and a cloud instance.

Plse advise if anyone else has had this issue, I really would like my piece of mind back that my VMs are in fact safe. 

 

Before I watch the game tonight, I thought I would post one last question, reviewing the TKLBAM pages under Fault Tolerance there is mention of situations where the Hub can become disconnected from the TKLBAM instance on the Server.

In these cases the server communicates directly with Amazon S3 and still backs up successfully, although the hub won't reflect this.  I suspect this is what is going on in my case.  If so how do I resolve the issue and restore connection from my VMs to the Hub?

Link- http://www.turnkeylinux.org/faq/hub-tklbams-central-point-failure

Jeremy Davis's picture

Both mine and the Hub's!

I recall this happening before and I'm sure that it was just a case of the Hub getting out of sync with TKLBAM. IIRC all the backups were safe and sound (and still worked with tklbam-restore) but just that the Hub didn't show them...

I've emailed Alon and we'll see how what he says. Thanks for your patience.

Alon Swartz's picture

Sorry for the late response...

Yesterday we noticed some issues with the Hub's backend taskengine (among other things, is used to sync backup metadata from Amazon S3 to the cache), and while bringing it back online we inadvertently lost some of the tasks in the queue. This only resulted in an out-of-date cache, not data loss.

Just to put everyone at ease, and as K Wayne mentioned above, TKLBAM will communicate directly with Amazon S3 and continue performing backups, even if the Hub is unreachable. In this specific scenario, the Hub was still reachable, but was not updating it's cache to reflect new backup record sessions.

To re-iterate, this does not mean the backups failed, just that the Hub didn't know about them. Not to worry though, on the next successful backup, TKLBAM will tell the Hub to update its cache, which also includes any backup record sessions the Hub missed.

If you don't want to wait for the next scheduled backup run, you can manually trigger the cache update by performing a manual backup:

tklbam-backup

If for some reason you don't want to perform a manual backup, and just want to trigger the Hub into updating its cache, here is some python code which should get the job done:

#!/usr/bin/python
import sys
sys.path.insert(0, '/usr/lib/tklbam')

from registry import registry, hub_backups
hb = hub_backups()
hb.updated_backup(registry.hbr.address)

We'll be keeping a closer eye on the taskengine, and take extra care not to loose any pending tasks if we run into issues in the future.

If anyone still has issues, let us know.

Jeremy Davis's picture

Your issue is unrelated to this thread (although the result is more-or-less the same).

As I just posted in another thread Alon is on the job trying to fix it...

Jeremy Davis's picture

Sorry I missed this post previously. The issue you mention should now be resolved (as of late last week).

FWIW it's almost always much better to open a new thread. Obviously if someone has recently posted about an issue that is exactly the same, it may be ok. But more often than not, whilst the behaviour appears the same, the cause of the issue often different.

Even if you think your issue is related, then it's still better to start a new thread and link to the other thread(s) which you think are relevant.

Opening a new thread will also get my attention much quicker than posting on a 18 month old thread...

PumWalters's picture

Hey Jeremy,

thanks for looking into this. Al seems to be fine now.

And I will keep your comment in mind.

Cheers,

Pum


Add new comment