Amanda-Users

RE: Amanda indexing and multiple backup rotations

2005-02-26 03:34:39
Subject: RE: Amanda indexing and multiple backup rotations
From: "Brooks, Jason" <Jason.Brooks AT windriver DOT com>
To: <amanda-users AT amanda DOT org>
Date: Fri, 25 Feb 2005 23:34:53 -0800
Hmm.  I am going to have to study this for a bit, perhaps beer or scotch
and let the brain percolate.

Something is flickering in the back of my brain that says either "That's
not really it" or "You are completely right".  As usual, working with
amanda requires me to think a bit more orthogonally.  :)

Ok, off to think now...

--jason
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jason Brooks ~ (503) 641-3440 x1861
      Direct ~ (503) 924-1861
Email to: jason.brooks AT windriver DOT com
Twiki: http://twiki.wrs.com/do/view/Main/JasonBrooks

Senior Systems Administration Analyst 
Wind River Systems
8905 SW Nimbus ~ Suite 255      
Beaverton, Or 97008
 

> -----Original Message-----
> From: owner-amanda-users AT amanda DOT org 
> [mailto:owner-amanda-users AT amanda DOT org] On Behalf Of Jon LaBadie
> Sent: Friday, February 25, 2005 10:32 PM
> To: amanda-users AT amanda DOT org
> Subject: Re: Amanda indexing and multiple backup rotations
> 
> On Fri, Feb 25, 2005 at 02:18:31PM -0800, Brooks, Jason wrote:
> > Jon H. LaBadie wrote:
> > > What, for your needs, are the properties of a "complete tapeset"?
> > 
> > A complete tapeset is set of tapes that holds all completed backups 
> > without having to determine "what tapes in a larger set have 
> > everything".  By definition, in amanda today all of the 
> tapes in the 
> > current defined tapeset is a "complete tapeset".  Clever use of 
> > "dumpcycle, tapecycle, runspercycle, and runtapes" will allow very 
> > fine granularity, but it is still up to a human to maintain 
> what tapes 
> > mean what if he plans to use offsite storage.
> > 
> ...
> > My point is that each configuration would maintain itself and there 
> > wouldn't be any "bleedover" into "another set".  If I were 
> to try and 
> > separate a tapecycle of 28 tapes into four 7-day 
> dumpcycles, I would 
> > have to do extra work.  For that matter, what says that a 
> new level 0 
> > of some host/disk won't be written amongst the other host/disk sets'
> > "previous" level n?  As far as I can tell, there is nothing 
> preventing 
> > this.  I could be wrong in this last though.
> > 
> 
> No, you are clearly not wrong.  I don't know if you've ever 
> looked at the output of amoverview.  Here is an edited sample 
> of my dump levels for six of my DLE's over 3 dumpcycles (18 
> dumps, 6 runs/cycle).
> 
> 
>     butch  /            1 0 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1
>     butch  /export      0 1 1 0 1 0 1 1 2 0 0 1 1 2 2 2 0 1
>     butch  /images      0           0         0           0
>     butch  /var         1 0 1 0 0 0 1 1 1 0 1 1 1 1 1 0 0 1
>     butch  //tec/D      0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1
>     butch  //winnie/C   0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0
> 
> 
> So that is three sets, but I don't see any particular 
> grouping that I would consider complete according to your 
> definition.  But any consecutive set of 6 tapes do contain at 
> least one level zero for each DLE and is current/complete 
> upto the date of the last tape of the 6 selected.
> 
> > My idea is simply to have a "superset" configuration: For 
> the example 
> > above, call this configuration "Global_daily".  Very little 
> data would 
> > be stored in this configuration: it would merely be aware 
> of any "child"
> > configurations and their relationship to each other.
> > Thus, I would run "amrecover Global_daily ", and set the 
> host, disk, 
> > and date.  The superset configuration will know which 
> config for the 
> > most relevent backup set to use.  In other words, If I need 
> to restore 
> > some files that are >3 weeks old, and the latest backup set was 
> > "daily4", the superset will know to consult "daily1"'s indexes.  It 
> > will be the "daily1" configuration that will turn around 
> and ask for 
> > the appropriate "daily-1-xx" tapes.
> 
> Is this so different than a single config?  You ask for files 
> > 3 weeks old amanda uses its single index and asks for 
> tape-17, tape-18, and tape-19.
> I'm probably being dense here, but what is gained by having 
> four separate configs for each week's worth of tapes?
> 
> Unless you want to play the "level 0's on Sunday" games.  
> Where the configs each force level 0's of everything on the 
> first tape(s) of each set and possibly force incrementals 
> only for the rest of the set.
> > 
> > Maybe "Global_daily"'s amamda.conf file will simply read:
> > child_set daily1
> > child_set daily2
> > child_set daily3
> > child_set daily4
> > 
> > It would be very easy to "retire" a backup configuration 
> too: simply 
> > stop using it, and put the tapes and their appropriate 
> documentation 
> > into deep storage.  Thus, If I get a command from legal to 
> "maintain 
> > all backups from this point forward" due to some impending lawsuit, 
> > the only thing I will be doing is creating a new backup set 
> each week, 
> > and retiring the oldest.  I know I can take the oldest 7 or 
> so tapes 
> > off of the rotation, but I would like it to be neat and tidier.
> 
> How do you envision rotating sets into the schedule?  Will 
> set daily2 just continue backing up where the last dumps of 
> set daily1 finished.  Or will it pick up where daily2 was 
> when it was last used, three weeks ago?  If each set 
> continues where the other left off then it seems to simply be 
> an artificial structure on top of a single config with four 
> dumpcycles of tape.
> 
> > 
> > Perhaps in addition, rather than there being a different 
> cronjob each 
> > time, run: "amdump Global_daily" each time.  Put another set of 
> > configuration lines in the global config file to determine 
> which set 
> > to use.  Perhaps something like (for dumpcycles of 7 days) 
> "52 weeks 
> > mod 4".  Then have the amdump script, or planner simply 
> exec "amdump 
> > daily3".
> > 
> 
> I.e. do some fancy calculation to have amanda select the next 
> set of tapes to be used.  Something it does automatically 
> from the tapelist anyway.  ???
> 
> > > 
> > > So why not number the tapes sequentially, 
> daily01-daily28; each week 
> > > you bring in a set of the 7 oldest tapes in to be overwritten.
> > 
> > Because this would require a human do maintain this "meta" 
> info about 
> > the tapes.  Humans can fail.  I fail more often than my 
> scripts.  thus 
> > if I can code it, I would be better served in the long run.  Larry 
> > Wall said it best when he wrote about lazyness.  I would 
> just rather 
> > not have to remember all of this extra info.
> > 
> 
> Certainly they can err and bring the wrong individual tapes.  
> They could also bring the wrong "set" of tapes.  And the 
> individual tapes could be put into labeled ziploc bags called 
> set 1, set 2, ... even if they were not from different configs.
> 
> > 
> > > You set dumpcycle to 7 (or even a little lower, to 6 or 5 days to 
> > > have some redundancy in a set of 7 tapes too).  And you 
> just switch 
> > > out next 7 tapes each time.  When you get out-of-sync, a "set" is 
> > > then defined as e.g. 2-8, 9-15, 16-22, 23-1 instead of 1-7, 8-14, 
> > > 15-21, 22-28.
> > 
> > Yes, but I am leaning towards having some built-in terminology to 
> > handle this.  I expect too that I could make up a different 
> type of superset:
> > Monthly level zero's, weekly level one's and daily level 2+'s.  I 
> > realize this last example kind of bypasses amanda's 
> functionality, but 
> > there are people who insist on this, and I think my idea is 
> a way to go.
> > 
> 
> For that type of thing I would consider completely separate configs.
> 
> > > You can set runtapes to e.g. 21 instead of 28, but still keep
> > > 28 tapes in the cycle, resulting in any of the last 7 tapes being 
> > > good enough to overwrite: no you can easily remove a tape 
> from the 
> > > cycle, without supplying a new one immediately too.
> > 
> > I already do something similar:
> > 
> > dumpcycle 2 weeks
> > runspercycle 10  # run m-f
> > tapecycle 20 tapes
> > runtapes 1
> > 
> > This way I can say "at any given time, I have 2 level 0 backups of 
> > everything on tape."
> 
> Actually, that should be "at least 2".
> 
> But that is not what Paul was refering to.  You have the 
> minimum recommended configuration, a total of two dumpcycles 
> worth of tapes in rotation and tapecycle set to that exact 
> value.  Paul was noting that it is valid, and some recommend, 
> that you have more tapes in rotation than your tapecycle 
> claims.  Suppose in your current arrangement a tape goes bad.
> Amanda will not tape anything until you amlabel a new tape.  
> If you don't have one, and must order it, that may be a 
> while.  But, if you really had 30 or 40 tapes in rotation, 
> amanda will happily use any of the extras that had not been 
> used in the last 19 (tapecycle count -1) dumps.  If your 
> legal beagles tell you to pull some tapes out, you can do so 
> and replace them at your leisure.
> > 
> > If I have to think about how to separate older tapes away 
> for storage, 
> > it's too much work.  I am the only systems administrator at 
> my site.  
> > If I get hit by a bus (bus interrupt) people need to have a simple 
> > system to work with.
> 
> Might this be simple enough.  Label your tapes not as 
> daily-1, daily-2, ...
> but daily-1.1, daily-2.1.  If you have to remove a tape 
> daily-10.1 because of damage (amrmtape) or for archive 
> (noreuse), replace it with one labeled daily-10.2.  The first 
> part of the number keeps the sequential nature of tape usage 
> for humans, the second part keeps them unique for the tapelist.
> 
> -- 
> Jon H. LaBadie                  jon AT jgcomp DOT com
>  JG Computing
>  4455 Province Line Road        (609) 252-0159
>  Princeton, NJ  08540-4322      (609) 683-7220 (fax)
>