Re: [Networker] Managing a NetWorker system
2008-08-22 10:07:46
On Aug 21, 2008, at 5:55 PM, Curtis Preston wrote:
It's taking me forever, but I'm still plodding along on developing the
outline for my latest book. (Congratulations to Preston de Guise on
HIS
book! Check it out at
http://www.enterprisesystemsbackup.com/Enterprise_Systems_Backup/About_t
he_Book.html.) I'm sure everyone on the list will buy BOTH books!
I have a question for you about managing a networker system. What are
the things you find yourself doing on a regular basis and how do you
do
them? Let me give you a few examples.
1. Monitoring backup success/failure.
One of my biggest pain points is monitoring backups and hardware. We
literally run backups 24x7 on our largest NetWorker server, mostly
because we have lots of clients that can't push data fast enough to
allow for a 12 hour window, although most finish within 24 hours.
Monitoring backups is a big pain point, both for checking backup
success, making sure what we think should be backed up is backed up,
doing chargebacks, and monitoring throughput. Another pain point is
monitoring tape integrity. I don't have an automated means to do that
and a lot of my Sony S-AIT1 tapes are aging so I take one out of
circulation on average per week. In fact, I just received a shipment
of new tapes to replace failed tapes yesterday.
a. Do you use nsradmin & scripts, NetWorker Management
Console, etc?
Very few scripts are in use here. We have two NetWorker servers with
maybe around 400 clients in total and only one employee (me) who knows
the entire system, and I am not even assigned to this responsibility
full time; I have other responsibilities unrelated to backups. I have
one pt helper who recently started working with me and he's doing
great with helping to troubleshoot failed backups.
2. Rerunning failed backups
Since we do not have staff to intervene at night when backups fail, my
usual procedure is to wait one or two days to see if the failures
occur again. If they do, I inform the respective SA to say a problem
has occurred, then the troubleshooting begins. Most times, a reboot of
the client fixes the issue. We almost always miss the window of
opportunity when a client's backup fails.
3. Putting tapes in a tape library, making them ready to use
Fortunately, we do not ship off many tapes. My biggest tape library
(930 slots) is located outside our primary data center so there's no
reason to ship off tapes from it. As a result, keeping track of the
tapes is easy. We also usually have enough in the library to meet our
needs and we add more tapes about two or three times a year. Where
tapes are concerned, my biggest problem is not keeping track of the
ones I own; its forecasting how many new tapes I will need to buy for
the coming fiscal year. This is a problem I still haven't solved. I
have also showed my assistant how to do this, so its a big help, but
we only really need to load tapes maybe once a week, if that.
4. Getting tapes offsite
a. I send originals and don't make clones
b. I use/don't use Alphastor
c. I send clones and make them via scripting
d. I send clones and make them via automated group cloning
From our smaller tape library (120 slots), we use it mostly to back
up a mainframe VTL (I will comment on that separately). It works well,
but the owners of the data want the tapes kept off-site at least 50
miles away. With that in mind, I have set up a very efficient, but
mostly manual procedure to do that. We have a third party company that
runs operations shifts in our main data center. I have a script that
runs six days a week. Every Sunday night, a full backup of that VTL is
done to LTO-3 tape, then my script wakes up and withdraws any tapes
that have VTL data on it regardless of whether or not they are full.
If a tape is being written to, my script will leave that tape alone,
although that condition rarely occurs. I have a set of slots allocated
as a load port in my Qualstar tape library and a cleaning tape mounted
directly above it. Our operators know to check that range of slots
every day (except Mondays) and they simply send any tapes there off-
site and they prepare a written (by hand) report each day of the tapes
that are off-sited. They are kept off-site for one month, then the
operators know to store them in a cabinet along with their
corresponding paper report. I then go and compare the report with
what's in the cabinet, load the returned tapes back into the tape
library, and they get used again. Its very manual, but in the one year
we've been doing this, it only takes a few minutes of my time per week
to deal with and I know where 100% of the tapes are. On Tuesdays, the
full backup consists of 7 tapes (for now), including the nightly
"savegrp -O" bootstrap data. The other five days, it averages around 3
tapes off-sited. My plan if we have a real DR situation is simply to
request the return of all the tapes we have off-site, so I really
don't need any elaborate or expensive means to track them. We also
send the original tapes off-site, not clones. The goal behind
implementing this project was to keep it cheap and simple. I have
achieved that.
5. Monitoring for capacity/throughput issues
This is another pain point here. I would like to learn what others are
doing with regard to monitoring both throughput issues and capacity
planning.
6. Installing new clients
Right now, we still have about half of our clients running 7.2 or
older and it is a real PIA to get SAs to take the time to do the
update to 7.4 SP2. I go through the list of clients and I start
nagging people to do the update based on the oldest NSR version I find
out there. If nothing happens, I send email to their manager notifying
them that in a week, backups will stop if the update is not done. For
7.4 clients, I am not worried about updating them to 7.4 SP2 yet, but
when I have time, I want to explore the push technology to do the
updates remotely.
The biggest single area where I spend time troubleshooting by far is
Windows backups. I don't want to get into a religious discussion, but
we are a healthy mixture of most operating systems and far and away,
Windows backups give us the most pain here. I would love to see better
troubleshooting techniques for Windows, especially Windows 2008.
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the body of the email. Please write to networker-request
AT listserv.temple DOT edu if you have any problems with this list. You can access the
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|
|
|