Networker

Re: [Networker] Managing a NetWorker system

2008-08-22 10:07:46
Subject: Re: [Networker] Managing a NetWorker system
From: Stan Horwitz <stan AT TEMPLE DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 22 Aug 2008 10:06:59 -0400
On Aug 21, 2008, at 5:55 PM, Curtis Preston wrote:

It's taking me forever, but I'm still plodding along on developing the
outline for my latest book. (Congratulations to Preston de Guise on HIS
book!  Check it out at
http://www.enterprisesystemsbackup.com/Enterprise_Systems_Backup/About_t
he_Book.html.)  I'm sure everyone on the list will buy BOTH books!



I have a question for you about managing a networker system.  What are
the things you find yourself doing on a regular basis and how do you do
them?  Let me give you a few examples.

1.      Monitoring backup success/failure.

One of my biggest pain points is monitoring backups and hardware. We literally run backups 24x7 on our largest NetWorker server, mostly because we have lots of clients that can't push data fast enough to allow for a 12 hour window, although most finish within 24 hours. Monitoring backups is a big pain point, both for checking backup success, making sure what we think should be backed up is backed up, doing chargebacks, and monitoring throughput. Another pain point is monitoring tape integrity. I don't have an automated means to do that and a lot of my Sony S-AIT1 tapes are aging so I take one out of circulation on average per week. In fact, I just received a shipment of new tapes to replace failed tapes yesterday.

        a.      Do you use nsradmin & scripts, NetWorker Management
Console, etc?

Very few scripts are in use here. We have two NetWorker servers with maybe around 400 clients in total and only one employee (me) who knows the entire system, and I am not even assigned to this responsibility full time; I have other responsibilities unrelated to backups. I have one pt helper who recently started working with me and he's doing great with helping to troubleshoot failed backups.

2.      Rerunning failed backups

Since we do not have staff to intervene at night when backups fail, my usual procedure is to wait one or two days to see if the failures occur again. If they do, I inform the respective SA to say a problem has occurred, then the troubleshooting begins. Most times, a reboot of the client fixes the issue. We almost always miss the window of opportunity when a client's backup fails.

3.      Putting tapes in a tape library, making them ready to use

Fortunately, we do not ship off many tapes. My biggest tape library (930 slots) is located outside our primary data center so there's no reason to ship off tapes from it. As a result, keeping track of the tapes is easy. We also usually have enough in the library to meet our needs and we add more tapes about two or three times a year. Where tapes are concerned, my biggest problem is not keeping track of the ones I own; its forecasting how many new tapes I will need to buy for the coming fiscal year. This is a problem I still haven't solved. I have also showed my assistant how to do this, so its a big help, but we only really need to load tapes maybe once a week, if that.

4.      Getting tapes offsite

        a.      I send originals and don't make clones
        b.      I use/don't use Alphastor
        c.      I send clones and make them via scripting
        d.      I send clones and make them via automated group cloning

From our smaller tape library (120 slots), we use it mostly to back up a mainframe VTL (I will comment on that separately). It works well, but the owners of the data want the tapes kept off-site at least 50 miles away. With that in mind, I have set up a very efficient, but mostly manual procedure to do that. We have a third party company that runs operations shifts in our main data center. I have a script that runs six days a week. Every Sunday night, a full backup of that VTL is done to LTO-3 tape, then my script wakes up and withdraws any tapes that have VTL data on it regardless of whether or not they are full. If a tape is being written to, my script will leave that tape alone, although that condition rarely occurs. I have a set of slots allocated as a load port in my Qualstar tape library and a cleaning tape mounted directly above it. Our operators know to check that range of slots every day (except Mondays) and they simply send any tapes there off- site and they prepare a written (by hand) report each day of the tapes that are off-sited. They are kept off-site for one month, then the operators know to store them in a cabinet along with their corresponding paper report. I then go and compare the report with what's in the cabinet, load the returned tapes back into the tape library, and they get used again. Its very manual, but in the one year we've been doing this, it only takes a few minutes of my time per week to deal with and I know where 100% of the tapes are. On Tuesdays, the full backup consists of 7 tapes (for now), including the nightly "savegrp -O" bootstrap data. The other five days, it averages around 3 tapes off-sited. My plan if we have a real DR situation is simply to request the return of all the tapes we have off-site, so I really don't need any elaborate or expensive means to track them. We also send the original tapes off-site, not clones. The goal behind implementing this project was to keep it cheap and simple. I have achieved that.

5.      Monitoring for capacity/throughput issues

This is another pain point here. I would like to learn what others are doing with regard to monitoring both throughput issues and capacity planning.

6.      Installing new clients

Right now, we still have about half of our clients running 7.2 or older and it is a real PIA to get SAs to take the time to do the update to 7.4 SP2. I go through the list of clients and I start nagging people to do the update based on the oldest NSR version I find out there. If nothing happens, I send email to their manager notifying them that in a week, backups will stop if the update is not done. For 7.4 clients, I am not worried about updating them to 7.4 SP2 yet, but when I have time, I want to explore the push technology to do the updates remotely.

The biggest single area where I spend time troubleshooting by far is Windows backups. I don't want to get into a religious discussion, but we are a healthy mixture of most operating systems and far and away, Windows backups give us the most pain here. I would love to see better troubleshooting techniques for Windows, especially Windows 2008.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>