Networker

Re: [Networker] Too many tape mounts

2003-07-31 17:56:38
Subject: Re: [Networker] Too many tape mounts
From: Bob Spurzem <bobs AT GOCMT DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 31 Jul 2003 14:47:48 -0700
I saw this same problem before, the solution was to use larger tapes (Super
DLT or LTO).  The small tapes cause too many mount requests.

Bob
CMT - The Tape People
1-800-252-9268
"we trade new tape media for old used tape media"

-----Original Message-----
From: Legato NetWorker discussion
[mailto:NETWORKER AT LISTMAIL.TEMPLE DOT EDU]On Behalf Of Stan Horwitz
Sent: Thursday, July 31, 2003 2:04 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: [Networker] Too many tape mounts


Has anyone here run into a situation with NetWorker 6.1.3 (Power Edition)
on a single CPU Enterprise 450 with Solaris 9 where NSR chokes when there
SEEM to be more pending tape mount requests than available tape drives?
Our Legato server handles daily backups for 112 different clients; a mix
of Windows, Novell, Solaris, and Tru64 Unix. We also run SnapImage to back
up a pair of Mirapoint message stores with 220GB of data each to our
server's tape library.

After going back and forth with Legato tech support regarding frequent,
but intermittent slow downs of our Legato server, I think I have hit upon
the common denominator between these slow downs: too many tape mount
requests in a given period of time. Legato said today that my hypothesis
makes sense so they are investigating it from that angle.

Meanwhile, I am wondering if anyone else has encountered this issue. Our
Legato server will get to a point where tape ejects and mounts can take
several hours to be processed. This causes our backup schedule to fall way
behind. When this happens, NSR becomes very unresponsive. For example,
opening up nsrwatch can take 10 or more minutes as does quitting out of it
by pressing the "q" key. Sometimes we'll get a slowdown condition that
lasts as little as half an hour, or maybe one or two hours; other times it
will last all night. When this happens, the /nsr/logs/daeman.log file
shows instances of nsrmd restart failures.

Not realizing this correlation, I have been juggling our backup schedule
almost every day, but all I believe I was actually doing was rescheduling
the slow down. I have just taken to dropping the savegroup parallelism
setting on the five or six savegroups that have the greatest number of
clients. I set them to equal to the number of target sessions on our tape
drives, which is 12. Fortunately, I have fairly wide lattitude in how our
backups are scheduled. This afternoon, I also took one of our larger (in
terms of number of clients) savegroups and broke it up into three separate
savegroups and scheduled them to run a few hours a part with their weekly
full backup scheduled on different days of the week. This is all at
Legato's recommendation so maybe these steps will help.

At any rate, when our NSR server slows down, operating system commands
such as "ls", the login process, etc. are also executed quickly. System
load rarely gets above 7 and even when it sinks way down, the problem does
not always go away.

We recently migrated our Qualstar tape library and backup server from a
system that runs Tru64 Unix and NSR 6.1.1 to this new Sun E450 server.
Since the migration, NSR has been nothing but problems. We will likely
retain outside assistance to help us deal with this issue and verify that
we have our Legato server set up with best practices in mind.  A colleague
and I have been pouring over documentation from Legato, Sun, and Qualstar
in an attempt to better understand this problem.

Meanwhile, I am wondering if anyone else has encountered this type of
problem with 6.1.3 and if you're seeing any nsrmmd restart cancelations
and/or RPC time out errors in your server's /nsr/logs/daemon.log file.


Thanks,


Stan

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>