Amanda-Users

RE: Amanda indexing and multiple backup rotations

2005-02-25 17:27:43
Subject: RE: Amanda indexing and multiple backup rotations
From: "Brooks, Jason" <Jason.Brooks AT windriver DOT com>
To: <amanda-users AT amanda DOT org>
Date: Fri, 25 Feb 2005 14:18:31 -0800
Jon H. LaBadie wrote:
> What, for your needs, are the properties of a "complete tapeset"?

A complete tapeset is set of tapes that holds all completed backups
without having to determine "what tapes in a larger set have
everything".  By definition, in amanda today all of the tapes in the
current defined tapeset is a "complete tapeset".  Clever use of
"dumpcycle, tapecycle, runspercycle, and runtapes" will allow very fine
granularity, but it is still up to a human to maintain what tapes mean
what if he plans to use offsite storage.

I am not saying what is there now doesn't work.  I am just feeling out a
way to expand amanda's current strengths.

Paul Bijnens wrote:

> That's one possibility.  But now you're forgetting that there 
> are holidays, sick people, tape drive problems, that all make 
> that you get out-of-sync with the normal set-numbering.  
> Sometimes you need an extra tape to flush the important 
> backups, because on the normal tape had an error after 
> writing only 15% of the tape.  Sometimes an operator forgets 
> to load the correct tape, and effectively you have no backup 
> for that day etc (or only a little bit what fit in the holding disk).
> 
> Or you suddenly come to the conclusion that the tape that is 
> expected to be overwritten the next night contains some very 
> important data still to be recovered.  When using the above 
> numbering scheme you cannot easily remove a tape from the sequence.

See, this is the thing: in order to split up a set of tapes, the
operator must know what tapes belong to what "set".  In my idea of a
perfect world, (having four separate sets of weekly backups) I would 
a) use a common distlist, changer file, and include commonalities
between amanda.conf's
b) label each set of tapes as suggested: daily-1-01 through daily-1-07,
daily-2-01 through daily-2-07, daily-3-01 through daily-3-07, and
daily-4-01 through daily-4-07.
c) run four different cron jobs (psuedo code here:)
        first week: amdump daily1
        second: amdump daily2
        third week: amdump daily3
        fourth: amdump daily4
        repeat
d) If extra tapes are needed, I would just label some additional tapes,
and put them into their respective sets.

My point is that each configuration would maintain itself and there
wouldn't be any "bleedover" into "another set".  If I were to try and
separate a tapecycle of 28 tapes into four 7-day dumpcycles, I would
have to do extra work.  For that matter, what says that a new level 0 of
some host/disk won't be written amongst the other host/disk sets'
"previous" level n?  As far as I can tell, there is nothing preventing
this.  I could be wrong in this last though.

My idea is simply to have a "superset" configuration: For the example
above, call this configuration "Global_daily".  Very little data would
be stored in this configuration: it would merely be aware of any "child"
configurations and their relationship to each other.
Thus, I would run "amrecover Global_daily ", and set the host, disk, and
date.  The superset configuration will know which config for the most
relevent backup set to use.  In other words, If I need to restore some
files that are >3 weeks old, and the latest backup set was "daily4", the
superset will know to consult "daily1"'s indexes.  It will be the
"daily1" configuration that will turn around and ask for the appropriate
"daily-1-xx" tapes.

Maybe "Global_daily"'s amamda.conf file will simply read:
child_set daily1
child_set daily2
child_set daily3
child_set daily4

It would be very easy to "retire" a backup configuration too: simply
stop using it, and put the tapes and their appropriate documentation
into deep storage.  Thus, If I get a command from legal to "maintain all
backups from this point forward" due to some impending lawsuit, the only
thing I will be doing is creating a new backup set each week, and
retiring the oldest.  I know I can take the oldest 7 or so tapes off of
the rotation, but I would like it to be neat and tidier.

Perhaps in addition, rather than there being a different cronjob each
time, run: "amdump Global_daily" each time.  Put another set of
configuration lines in the global config file to determine which set to
use.  Perhaps something like (for dumpcycles of 7 days) "52 weeks mod
4".  Then have the amdump script, or planner simply exec "amdump
daily3".

> 
> So why not number the tapes sequentially, daily01-daily28; 
> each week you bring in a set of the 7 oldest tapes in to be 
> overwritten.

Because this would require a human do maintain this "meta" info about
the tapes.  Humans can fail.  I fail more often than my scripts.  thus
if I can code it, I would be better served in the long run.  Larry Wall
said it best when he wrote about lazyness.  I would just rather not have
to remember all of this extra info.


> You set dumpcycle to 7 (or even a little lower, to 6 or 5 
> days to have some redundancy in a set of 7 tapes too).  And 
> you just switch out next 7 tapes each time.  When you get 
> out-of-sync, a "set" is then defined as e.g. 2-8, 9-15, 
> 16-22, 23-1 instead of 1-7, 8-14, 15-21, 22-28.

Yes, but I am leaning towards having some built-in terminology to handle
this.  I expect too that I could make up a different type of superset:
Monthly level zero's, weekly level one's and daily level 2+'s.  I
realize this last example kind of bypasses amanda's functionality, but
there are people who insist on this, and I think my idea is a way to go.

For what it's worth, I am not attempting to say "someone ought to code
all of my ideas".  I have been thinking about how to do this, and may
just take a crack at the code myself.  I am writing here to see what
others think. 

> You can set runtapes to e.g. 21 instead of 28, but still keep 
> 28 tapes in the cycle, resulting in any of the last 7 tapes 
> being good enough to overwrite: no you can easily remove a 
> tape from the cycle, without supplying a new one immediately too.

I already do something similar:

dumpcycle 2 weeks
runspercycle 10  # run m-f
tapecycle 20 tapes
runtapes 1 

This way I can say "at any given time, I have 2 level 0 backups of
everything on tape."

I hope this message makes it clearer what I want.  There is one major
point that W. Curtis Preston made to me in his book "unix backup &
recovery": keep it simple.  I like amanda because I don't have to
remember what dump image is stored and where: amanda will tell me.
Heck, amanda is so good at this, I run rsync and backup all of my
indexes, curinfo, and backup logs to a completely different site for
live use in case my backup server dies.

If I have to think about how to separate older tapes away for storage,
it's too much work.  I am the only systems administrator at my site.  If
I get hit by a bus (bus interrupt) people need to have a simple system
to work with.

--jason
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason Brooks ~ (503) 641-3440 x1861
      Direct ~ (503) 924-1861
Email to: jason.brooks AT windriver DOT com
Twiki: http://twiki.wrs.com/do/view/Main/JasonBrooks

Senior Systems Administration Analyst 
Wind River Systems
8905 SW Nimbus ~ Suite 255      
Beaverton, Or 97008