Networker

Re: [Networker] Holiday Shutdown?

2006-12-13 15:26:11
Subject: Re: [Networker] Holiday Shutdown?
From: Dave Mussulman <mussulma AT UIUC DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 13 Dec 2006 14:13:41 -0600
On Tue, Dec 12, 2006 at 08:50:36AM -0500, Nancy Magers wrote:
> Thanks so much for your speedy response - I am responsible for backup for
> central computing at Brown University.   As I'm sure you are aware having
> input from other universities is always helpful in steering universities to
> the correct decisions in our environments.  Your email on this will be quite
> helpfull to get our datacenter to do the right thing.  We have had holiday
> shutdown going on three years now and this is the first year they have made
> an issue out of this.  Most likely because we have an issue keeping up with
> storage for our ever growing backup requirements, so our datacenter manager
> can't keep enough tapes in the libraries.  I am going to suggest to him to
> empty the library of any full staging tapes and load it up with scratch
> tapes to handle the holiday week.

What an interesting discussion.  It sounds odd to consider backups as
something that should be shut down during reduced staffing, for maybe
the same reason it would be weird to consider the water or networking
turned off because of a holiday break.

At first blush, it sounds foreign -- your data would be a week old when
backups started back up again.  BUT, if none of it changed during that
time anyway, there's no risk.  A simple answer to a simple problem.  But
we all know somebody's going to work from home and change files, and
system logs will fill up and rotate, and having the most up to date data
is important.  If a system gets hacked on Thursday, would only being
able to restore up to Saturday night's logs help troubleshoot it?  Are
those limitations your environment can live with?

The question of why to shutdown at all makes more sense when you talk
about an overgrown library, and handling tape rotation during the break
week.  The big question is, do you want to optimize for backups or
restores?  If you take all of your data tapes out of the library for
empties, and need a restore, you're guaranteed to get paged (or the
restore will hang/wait.)  Taking the fulls out and leaving incremental
storage in the library is only marginally better -- different restores
may still require the fulls.  (And then it's more frustrating -- the
file they need is on a tape in the library, but you're blocking waiting
on a full tape for the directory, etc.)  

The messiest case, at least as far as Networker is concerned, would be
savegroups running without media available to write to.  Groups waiting
on media would hang, run longer than 24 hours (or whatever your set
period is,) fail to start on their second instance because the first is
already running, etc.  Yuck.  This can play havoc with your levels
running on schedule, and in some cases I've had the client lock files to
read them (such as mail spools) and hold that lock while they're waiting
on server media.  That can cause nasty side effects you don't want.

Hopefully, there are ways to make compromises.  Perhaps there are full
tapes in your library that do not have incremental dependencies on them,
and those might be removed.  (Latest version is online, older versions
are near line.)  Maybe there's a subset of data (either in groups or
pools) that could be disabled or removed that wouldn't need backups or
restores that week.  A closer look at your dataset might reveal you
could keep 90% of your service online that week if 10% were taken
offline/out of the library, and maybe that's an appropriate trade off.
You'd have to look at your groups, your pools and your objectives to
make that call.  I know I'll have to do the same thing in two weeks, and
I hope those decisions will pan-out for me.

If you do shutdown, I recommend leaving the server up so it can continue
to do its thing, and is available for restores.  It's pretty easy to
adjust the autostart setting in groups to disable to shut them down
temporarily.  I'd go that route (instead of messing with schedules or
directives.)  However, modifying the schedules might give you some
breathing space too without sacrificing your restore policies.  For
example, if you did fulls on Saturdays and incrementals through the
week, and wanted to provide restores from the previous week, it might
take you fewer tapes in the jukebox to turn the 23rd's backup into an
incremental and run a two week incremental period (than it would take to
hold another full and all of its incrementals.)  It's really
situational, so it's hard to advise.

Finally, before I pack it up for Christmas break, I always review and
refresh the jukebox and operator instructions.  I make sure they're
still valid, and put a printed copy on top of the tape library.  If I
know which staff/students are going to be around (in town as relief
people,) I give them a 10 minute tour of basic tape swapping activities.
Most of the time I can narrate that over the phone (with me twiddling
the software remotely,) but it's nice to give them a face-to-face
walkthrough first.  Especially when I'm skiing a couple of hours away
and don't want to be a holiday tape jockey.  ;)

Good luck,
Dave

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>