Confusion re managing backups on Amazon Hub Site

Scott Howard's picture

Ok , here I go again ...

I'm a bit confused as to how the backup volumes can be managed on the amazon hub site.

Regular readers will now know that I'm using the lucid file server appliance.

The first back up of my current working system was done on 29-10-2010, and has an ID Label of 3 on the site. My appliance is set to do a full backup each month and incremental in between as is default.

When looking through the backup chains on the amazon site , each day is indeed listed there, and lo and behold on the 29th November , the entry is listed as <FULL> as opposed to <inc>.

So, onto the questions :

1. The original back up size on 29th October is quoted as 4.3 GB, but on the 29th November the <FULL> session size is quoted as 2.7 GB.  The amount of data over the month would have inceased, and the system is otherwize unchanged as far as I can see.

After reading what I can about the backup chains , my understanding is that all that is needed for a full restore of a system is all the incremental sessions leading back to the last <FULL> session. Thus the size difference in the original full back up and the one amonth later concerns me.

2. Leading on from this, as it stands presently , each daily backup is creating one long continuous backup chain with ever increasing  data size on the hub, and increasing cost (although yes it is cheap). Is there a way to start a new backup id session (i.e a new chain starting with a full backup), or alternatively, a way to delete incremental sessions prior to previous <FULL> session in the chain so as to reduce the amount of data stored on the hub ?

3. Currently the system always reports the original creation date of the backup with tklbam-list, the only way I can see to check the last full backup date is to manually look at the hub as I have done or to follow the output as tklbam-backup does its stuff...  It might be nice for ease of checking if the last full back up date was reported somewhere up front ?

Sorry to again be so long winded.

Thanks in anticipation,

Scott H.

Liraz Siri's picture

1) The reduced size of the full backup a month later is indeed puzzling if you did not modify the configuration of the backup (e.g., /etc/tklbam/overrides) or delete something. Cached files somewhere on your server might be responsible so make sure it's not that. I would take a careful look at the backup logs in /var/log/tklbam-backup. The log includes a list of all files backed up which you could use to compare between sessions. Another way to explore and verify your backups is to spin up an instance on EC2 for an hour and restore your backup to it. Verifying that your backups work is supposed to be one of the main advantages of TKLBAM. Once you restore the backup, I recommend you use the "ncdu" program to figure out how the disk space is used.

2) Don't worry about data retention. We will soon add a feature to the Hub that allows you to specify maximum full backups. The logic is straightforward: we already have the list of full/incremental sessions in the database and we already have a hook that runs when a backup record is updated. So every time a backup is updated we can run a little check on it. If a new full backup has just been completed, we check how many full backups there are in total and if the number exceeds the maximum, we delete the oldest backup chain. By default, the Hub's data retention for a given backup will be set to "unlimited" unless you change it. That way we don't accidentally delete any data you might need.

3) The creation date of the backup record makes sense as it tells you how far back the records go. Whether or not a specific backup is an incremental session or a full backup is a technical detail. The ultimate result should be the same - you can restore the machine state from that date. But I'd like to know more about your use case. Could you explain a bit more why you care to know what the last full backup is? What problem does this solve? How do you use that information?

Scott Howard's picture

I guess the reason I was interested was in relation to the data retention size on Amazon, knowing where to trim off the older parts of the back up chain.

Im extremely intersted to hear about this feature of specifying maximum full backups. What would be the minimum number of full back up chains be kept on the server? Do you mean by "unlimited" i.e the default, that  the chain will never be trimmed, or trimmed after exceeds the preset maximum that you talked about ?

thaks again

Scott H.

ps I havenet had a chance to look at the log files yet as your suggested but as soon as a I can I will post any conclusion I can  make from it .... probably a misunderstading at my end ... again

Liraz Siri's picture

The minimum number of full backup chains would be exactly 1 backup chain. So every time you create a new full backup it would delete the previous full backup chain. But by default we won't be deleting anything. Amazon storage is cheap enough that I don't think it makes sense to delete backup data for users unless they ask us to explicitly. I talked to Alon about the logic and we will be implementing this shortly.

PS: Sorry for the absurdly late reply. I didn't notice you posted a follow up question until now.

Scott Howard's picture

Thanks again for the reply, I'm still not totally clear on this issue but as you say the storage is cheap.

Scott H.

Post new comment