Networker

Re: [Networker] 7.0 Experience

2003-07-01 14:11:05
Subject: Re: [Networker] 7.0 Experience
From: "Reed, Ted G II [ITS]" <ted.reed AT MAIL.SPRINT DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Tue, 1 Jul 2003 13:10:56 -0500
Have pre-labeled empties for at least 3 days totals (in this environment, 100 
tapes per night per storage node is normal), so I keep about 600+ tapes labeled 
empty for the environment.

You are right, the 9840's mount/dismount like a bad dog.  And the STK 9310 
Powderhorn silo can do something like 300+ mounts per hour.  The problem is 
that nsrd is SO busy that when the automated nsrjb request comes in, it sits in 
queue....and sits....and sits....and legato asks for another tape...and 
sits...and sits...and asks for another tape...and sits.  Last monday we came in 
to find that both nodes were at "media critical waiting for 10 writable media" 
..... and we only have 10 devices per node!!!  And even our manual mount 
requests, which appear to get 'bumped' to the head of the queue, were taking 30 
minutes a-piece between the slow GUI and the slow mounts.  

So I don't blame STK in any way, shape, or form....I put the blame fully on the 
overworked nsrd process and a potentially inappropriate prioritization (or lack 
thereof) within the application for system-initiated mounts.

I'd love any other advice.....I admit I haven't covered all the things we've 
done to try to live through this.  We HAVE been living with this for over a 
year, but we're always sitting at the knife edge of media disaster (see last 
monday <g>).  Thanks to everyone for their great advice.
--Ted


-----Original Message-----
From: Dale Mayes [mailto:dmayes AT kimball DOT com]
Sent: Tuesday, July 01, 2003 11:19 AM
To: Legato NetWorker discussion; Reed, Ted G II [ITS]
Subject: RE: [Networker] 7.0 Experience


Ted,

I'm surprised you're having issues with 9840's.

I've got the 9840B both fibre and scsi based and the speed of loading
the drives is superb.

Is your issue with re-labeling?

I use to have this problem with DLTs because NetWorker can take a long
time to perform the re-label of a tape vs just loading and writing to a
pre-labeled tape.

I use to run a scheduled script to re-label a sufficient number of
expired tapes before the nightly backups. This completely eliminated the
problem.

What are you going to do with your 9840's?

HTH...Dale

Dale Mayes
Storage Systems Engineer
Kimball International, Inc.

-----Original Message-----
From: Reed, Ted G II [ITS] [mailto:ted.reed AT MAIL.SPRINT DOT COM] 
Sent: Tuesday, July 01, 2003 11:06 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] 7.0 Experience

When it first started occurring on an every night basis, we did all
forms of standard troubleshooting, including truss.  There was nothing
out of the ordinary (mostly db work??  been a while).  We even had a
Legato "Healthcheck" performed by one of their techs.  His
conclusion....we must be doing something right if we're getting 8TB per
night to 20 drives.  

That was under 6.1.1.  Since we have moved to 6.1.3, at least nsrd is
lower during the day then it used to be....and we're seeing fewer media
criticals.  But it still is the case that if we start to fall behind on
media requests, everything starts to bog and it gets worse and worse as
time progresses.  Last Saturday, one of our staff spent 10 hours playing
mount catch-up because the nodes had fallen behind so badly.  And each
time our guy got close to clearing the mount requests, 2 or 3 tapes
would hit Full status and legato would demand additional tapes.

FYI...part of the issue is 9840 tapes.  Since they are 20G native, they
can fill very quickly.....and they have a tendency to do so in groups,
so you go from "Waiting for 1 writable volumes" to "Waiting for 5
writable volumes" in a matter of seconds.  Classic one step forward, two
steps back. 
We are going to 9940B drives (200G native), which we hope will wipe out
this issue by decreasing the tape mount requests to ~10% of original
numbers.  Regardless, that doesn't stop my current environment from
queuing legato-initiated tape mounts which results in a need for manual
intervention.

--Ted

-----Original Message-----
From: Byron Servies [mailto:bservies AT PACANG DOT COM]
Sent: Monday, June 30, 2003 7:03 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] 7.0 Experience


On June 30, 2003 at 17:27, Reed, Ted G II [ITS] wrote:
> During day up to 4 active networker/nwadmin sessions.
> During night, only one.

Well, it was just a thought.  :-)

> Always horribly behind on mounts (they seem to not be priority)
> and single 400Mhz Ultrasparc II (III?) always pegged just running
nsrd.

Have you tried running truss on the nsrd to see what it is doing?

Byron

--
Note: To sign off this list, send a "signoff networker" command via
email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via
email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>