Amanda-Users

Re: Amanda indexing and multiple backup rotations

2005-02-26 01:38:17
Subject: Re: Amanda indexing and multiple backup rotations
From: Jon LaBadie <jon AT jgcomp DOT com>
To: amanda-users AT amanda DOT org
Date: Sat, 26 Feb 2005 01:32:13 -0500
On Fri, Feb 25, 2005 at 02:18:31PM -0800, Brooks, Jason wrote:
> Jon H. LaBadie wrote:
> > What, for your needs, are the properties of a "complete tapeset"?
> 
> A complete tapeset is set of tapes that holds all completed backups
> without having to determine "what tapes in a larger set have
> everything".  By definition, in amanda today all of the tapes in the
> current defined tapeset is a "complete tapeset".  Clever use of
> "dumpcycle, tapecycle, runspercycle, and runtapes" will allow very fine
> granularity, but it is still up to a human to maintain what tapes mean
> what if he plans to use offsite storage.
> 
...
> My point is that each configuration would maintain itself and there
> wouldn't be any "bleedover" into "another set".  If I were to try and
> separate a tapecycle of 28 tapes into four 7-day dumpcycles, I would
> have to do extra work.  For that matter, what says that a new level 0 of
> some host/disk won't be written amongst the other host/disk sets'
> "previous" level n?  As far as I can tell, there is nothing preventing
> this.  I could be wrong in this last though.
> 

No, you are clearly not wrong.  I don't know if you've ever looked at
the output of amoverview.  Here is an edited sample of my dump levels
for six of my DLE's over 3 dumpcycles (18 dumps, 6 runs/cycle).


    butch  /            1 0 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1
    butch  /export      0 1 1 0 1 0 1 1 2 0 0 1 1 2 2 2 0 1
    butch  /images      0           0         0           0
    butch  /var         1 0 1 0 0 0 1 1 1 0 1 1 1 1 1 0 0 1
    butch  //tec/D      0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1
    butch  //winnie/C   0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0


So that is three sets, but I don't see any particular grouping that I
would consider complete according to your definition.  But any consecutive
set of 6 tapes do contain at least one level zero for each DLE and is
current/complete upto the date of the last tape of the 6 selected.

> My idea is simply to have a "superset" configuration: For the example
> above, call this configuration "Global_daily".  Very little data would
> be stored in this configuration: it would merely be aware of any "child"
> configurations and their relationship to each other.
> Thus, I would run "amrecover Global_daily ", and set the host, disk, and
> date.  The superset configuration will know which config for the most
> relevent backup set to use.  In other words, If I need to restore some
> files that are >3 weeks old, and the latest backup set was "daily4", the
> superset will know to consult "daily1"'s indexes.  It will be the
> "daily1" configuration that will turn around and ask for the appropriate
> "daily-1-xx" tapes.

Is this so different than a single config?  You ask for files > 3 weeks old
amanda uses its single index and asks for tape-17, tape-18, and tape-19.
I'm probably being dense here, but what is gained by having four separate
configs for each week's worth of tapes?

Unless you want to play the "level 0's on Sunday" games.  Where the configs
each force level 0's of everything on the first tape(s) of each set and possibly
force incrementals only for the rest of the set.
> 
> Maybe "Global_daily"'s amamda.conf file will simply read:
> child_set daily1
> child_set daily2
> child_set daily3
> child_set daily4
> 
> It would be very easy to "retire" a backup configuration too: simply
> stop using it, and put the tapes and their appropriate documentation
> into deep storage.  Thus, If I get a command from legal to "maintain all
> backups from this point forward" due to some impending lawsuit, the only
> thing I will be doing is creating a new backup set each week, and
> retiring the oldest.  I know I can take the oldest 7 or so tapes off of
> the rotation, but I would like it to be neat and tidier.

How do you envision rotating sets into the schedule?  Will set daily2 just
continue backing up where the last dumps of set daily1 finished.  Or will
it pick up where daily2 was when it was last used, three weeks ago?  If
each set continues where the other left off then it seems to simply be an
artificial structure on top of a single config with four dumpcycles of tape.

> 
> Perhaps in addition, rather than there being a different cronjob each
> time, run: "amdump Global_daily" each time.  Put another set of
> configuration lines in the global config file to determine which set to
> use.  Perhaps something like (for dumpcycles of 7 days) "52 weeks mod
> 4".  Then have the amdump script, or planner simply exec "amdump
> daily3".
> 

I.e. do some fancy calculation to have amanda select the next set of
tapes to be used.  Something it does automatically from the tapelist
anyway.  ???

> > 
> > So why not number the tapes sequentially, daily01-daily28; 
> > each week you bring in a set of the 7 oldest tapes in to be 
> > overwritten.
> 
> Because this would require a human do maintain this "meta" info about
> the tapes.  Humans can fail.  I fail more often than my scripts.  thus
> if I can code it, I would be better served in the long run.  Larry Wall
> said it best when he wrote about lazyness.  I would just rather not have
> to remember all of this extra info.
> 

Certainly they can err and bring the wrong individual tapes.  They could
also bring the wrong "set" of tapes.  And the individual tapes could be
put into labeled ziploc bags called set 1, set 2, ... even if they were
not from different configs.

> 
> > You set dumpcycle to 7 (or even a little lower, to 6 or 5 
> > days to have some redundancy in a set of 7 tapes too).  And 
> > you just switch out next 7 tapes each time.  When you get 
> > out-of-sync, a "set" is then defined as e.g. 2-8, 9-15, 
> > 16-22, 23-1 instead of 1-7, 8-14, 15-21, 22-28.
> 
> Yes, but I am leaning towards having some built-in terminology to handle
> this.  I expect too that I could make up a different type of superset:
> Monthly level zero's, weekly level one's and daily level 2+'s.  I
> realize this last example kind of bypasses amanda's functionality, but
> there are people who insist on this, and I think my idea is a way to go.
> 

For that type of thing I would consider completely separate configs.

> > You can set runtapes to e.g. 21 instead of 28, but still keep 
> > 28 tapes in the cycle, resulting in any of the last 7 tapes 
> > being good enough to overwrite: no you can easily remove a 
> > tape from the cycle, without supplying a new one immediately too.
> 
> I already do something similar:
> 
> dumpcycle 2 weeks
> runspercycle 10  # run m-f
> tapecycle 20 tapes
> runtapes 1 
> 
> This way I can say "at any given time, I have 2 level 0 backups of
> everything on tape."

Actually, that should be "at least 2".

But that is not what Paul was refering to.  You have the minimum recommended
configuration, a total of two dumpcycles worth of tapes in rotation and
tapecycle set to that exact value.  Paul was noting that it is valid,
and some recommend, that you have more tapes in rotation than your
tapecycle claims.  Suppose in your current arrangement a tape goes bad.
Amanda will not tape anything until you amlabel a new tape.  If you don't
have one, and must order it, that may be a while.  But, if you really had
30 or 40 tapes in rotation, amanda will happily use any of the extras that
had not been used in the last 19 (tapecycle count -1) dumps.  If your legal
beagles tell you to pull some tapes out, you can do so and replace them at
your leisure.
> 
> If I have to think about how to separate older tapes away for storage,
> it's too much work.  I am the only systems administrator at my site.  If
> I get hit by a bus (bus interrupt) people need to have a simple system
> to work with.

Might this be simple enough.  Label your tapes not as daily-1, daily-2, ...
but daily-1.1, daily-2.1.  If you have to remove a tape daily-10.1 because
of damage (amrmtape) or for archive (noreuse), replace it with one labeled
daily-10.2.  The first part of the number keeps the sequential nature of
tape usage for humans, the second part keeps them unique for the tapelist.

-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)