Bacula-users

Re: [Bacula-users] trying to recreate a retrospect (I know...) backup strategy

2010-02-19 19:45:07
Subject: Re: [Bacula-users] trying to recreate a retrospect (I know...) backup strategy
From: Arno Lehmann <al AT its-lehmann DOT de>
To: Bacula Users <bacula-users AT lists.sourceforge DOT net>
Date: Sat, 20 Feb 2010 01:28:28 +0100
Hello,

20.02.2010 00:35, shouldbe q931 wrote:
> As always, your're very helpfull :-)

Glad to hear that!

> 
> On Fri, Feb 19, 2010 at 10:09 PM, Arno Lehmann <al AT its-lehmann DOT de> 
> wrote:
>> Hello,
>>
>> 19.02.2010 22:09, shouldbe q931 wrote:
>>> Hi All,
>>>
>>> The Retrospect backup had two sets of "incremental archive" tapes,
>>> that ran for a year, set A in the 1st week, set B in the 2nd then set
>>> A in the 3rd week etc. Independently to this, at weekends there were a
>>> different set of tapes for doing full backups that rotated each week.
>> So far quite straightforward. I still don't see what advantage you
>> have to change tape sets on a weekly basis, but if you like to, why not...
> 
> The full backups each week are for to provide for a "quick restore" of
> the current state of the sever, which would be at worst 1 week old,
> but as we now have a daily mirror to disk with 2 weeks of
> incrementals, we could forget about them if we were able to move the
> mirror offsite (not likely to happen for a few months)

Hmm... I'll comment on that further below.

> The two sets of  "archive incrementals" is to allow for one set to be
> stored offsite, and the other to be used for retrives

That's reasonable.

>     If you think of it as _very_ basic HSM, if a set of files is
> deleted, they can always be recovered if they were on the server for
> at least 24 hours or 7 days depending on which set. We looked at using
> a new tape each week, but it would get quite expensive on LTO3 tapes,
> and storage space for them...

Well, if the price of the backup media exceeds the value of the data, 
you're doing something wrong - on the other hand, if a few hundred or 
even thousand euros are too much to get reliable, multi-generation 
backups, I think the priorities should be reconsidered...

> Having two independent sets allows for 1 set to be lost, and have not
> lost all of the data that was on Tape.

Correct.

>>> I'm struggling with how to recreate this in Bacula.
>> I suggest you should look at it from a higher level point of view:
>> What do you want to achieve, and how can you most easily achieve it?
>> One of the most important things in a backup setup, in my opinion, is
>> to keep it as simple as possible.
>>
>>> My sticking points
>>> are how to not "invalidate" the incremental sets with each other, and
>>> how if a restore is needed from the incremental archive, it would only
>>> search through the incremental archive, and not point to one of the
>>> full backups,
>> The latter idea can't work - an incremental only backs up changes
>> against the previous backups, and you'll always end up with the latest
>> full backup in the chain. Otherwise they wouldn't be incrementals.
>>
> Retrospect allowed this to happen by having a catalog for each set.
> Each Monday incremental contained all of the changes from the previous
> week.

Ok, if you want that you need two jobs. They can share clients and 
file sets, but would be scheduled at interleaving weeks.

>>>  would I be able to purge or prine each tape before it
>>> was re-used for the full backup, or would I also have to "purge files
>>> from job" of the previous job they were used for ?
>> Why would you want to intentionally make data inaccessible?
> 
> The idea of pruning/purging those records is to prevent Bacula from
> attempting to restore a file from them after they have been re-used

Bacula will not do that, as it knows which files are part of each job 
run. If a volume is recycled, per its retention policy, Bacula 
automatically prunes outdated information - but only then. So, if you 
leave that task to Bacula, you make sure you can access as much of 
your backup data as possible.

> to
> recover a file that would be on the "incremental archive", and to
> reduce the size of the database (they want a years worth of file),
> then the next year start a new database, as an aside, the retrospect
> catalog couldn't quite make it for 12 months before it hit a 2gb hard
> limit, so they moved to doing 6 month cycles.

I understand that Retrospects attempt to catalog backed up files 
failed. Well, that doesn't affect Bacula, unless you want to recreate 
that failure, too :-)

> The file server has ~3TB
> of data (quite a lot of big photoshop files) across ~350,000 files
> 
>>> I was thinking that I needed three catalogs, but as the catalog is per
>>> client, that wouldn't work. Would having three differently named, but
>>> identically configured file sets work ?
>> I really don't know why you'd need more than one catalog for this.
>>
>>> Or am I going to need to run three instances of the file daemon to get
>>> three catalogs, and if so, I'd really appreciate somebody pointing me
>>> in the direction of  how to do this.
>> God no... stop! ;-)
>>
>> I'm just guessing what you really want, but this is my idea:
>>
>> Create pools EvenFull, OddFull, EvenIncr, OddIncr. Set retention times
>> etc. as you want them. Use time should be six days.
>>
>> Schedule jobs like this:
>>
>> run level=full pool=EvenFull w01, w03, w05, w07, ..., w53 Sun at 22:00
>> run level=incremental pool=EvenIncr w00, w02, w04, ..., w52 Mon-Fri at
>> 22:00
>> run level=full pool=OddFull w00, w02, w04, ..., w52 Sun at 22:00
>> run level=incremental pool=OddIncr w01, w03, w05, ..., w53 Mon-Fri at
>> 22:00
>>
>> If you can explain why I use the EvenFull pool on Sundays of the odd
>> weeks you know what I'm aiming at :-)
>>
> I think I follow your schedule and use of pools
> 
> Mon (Einc) Tue (Einc) Wed (Einc) Thur (Einc) Fri (Einc) Sun (Ofull)
> Mon (Oinc) Tue (Oinc) Wed (Oinc) Thur (Oinc) Fri (Oinc) Sun (Efull)
> etc
> 
> Einc and Oinc would have a retention period of $infinite,

Is that really what you want? I doubt that...

> Efull and
> Ofull would have a retention period of 2 weeks

An expired and recycled full backup makes the incrementals building on 
it much less valuable... typically, I implement things the other way 
around: Long retention for full backups, shorter retention for 
differentials, and shortes for incrementals. I still believe that is a 
useful approach.

> However as I understand it, this would effectively be a single tape
> pool, that between them covered all of the year, but any data would
> only ever be on one tape, there would be no duplication, and its the
> duplication that they are after.

So you keep several full backups stored. As much of older data won't 
change between these full backups, you've got a good chance to have 
"historical" data in several copies.

> They are after the duplication from
> previously using DDS, AIT and SAIT with considerably lower reliability
> than DLT or LTO. The new library is an LTO3, and I would hope that
> they won't suffer anything like the same quantity of failed tapes, but
> once bitten twice shy...

Always keep more than one copy of your data... that's called backup. 
Even comparatively reliable technology like LTO will eventually break.

> As an alternative, could I run/spool incrementals to disk, and then
> copy/despool/merge them to two tape pools ? That way they get their
> data on two sets of tapes.

Yup. What I'd suggest is to run all backups to disk initially, keep 
full backups longest, incrementals for a shorter time. Again, this 
should be set up in the pool definitions, and you shouldn't have to 
intervene manually.

With some calculation, scratch pools, and reasonable retention times 
you'll end up with a solution that works without any user interaction, 
keeping backups for as long as possible and more or less optimally 
using the available space.

Now add copy jobs to that - copy the latest full backups to tape each 
week or each month... or copy to tape in a pool with unlimited 
retention time once a month, and the other weeks to a pool with 
limited, but longer than the on-disk pools', retention time.

Given your requirements that would be the solution I'd implement.

Arno

>> So, more abstract answer:
>> Don't play with catalogs, multiple FD instances, forced pruning or
>> purging - use pools and schedules. Fine-tuning the pool settings can
>> be challenging if your jobs tend to run very long, but that's only a
>> technical problem...
>>
>> Cheers,
>>
>> Arno
>>
>>
>>> Many thanks
>>>
>>> Arne
>>>
>>> ------------------------------------------------------------------------------
>>> Download Intel&#174; Parallel Studio Eval
>>> Try the new software tools for yourself. Speed compiling, find bugs
>>> proactively, and fine-tune applications for parallel performance.
>>> See why Intel Parallel Studio got high marks during beta.
>>> http://p.sf.net/sfu/intel-sw-dev
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users AT lists.sourceforge DOT net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>>
>> --
>> Arno Lehmann
>> IT-Service Lehmann
>> Sandstr. 6, 49080 Osnabrück
>> www.its-lehmann.de
>>
>> ------------------------------------------------------------------------------
>> Download Intel&#174; Parallel Studio Eval
>> Try the new software tools for yourself. Speed compiling, find bugs
>> proactively, and fine-tune applications for parallel performance.
>> See why Intel Parallel Studio got high marks during beta.
>> http://p.sf.net/sfu/intel-sw-dev
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>