Veritas-bu

[Veritas-bu] Question posed to ACSLS/STK8500 users.

2006-12-08 14:43:52
Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
From: jpiszcz at lucidpixels.com (Justin Piszcz)
Date: Fri, 8 Dec 2006 14:43:52 -0500 (EST)
It is 100% correct.  Yep.  I ran about 5 test backups to each drive in the 
robot.  No problems.  It is only when there is a burst of jobs.

Justin.

On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote:

> Justin,
> 
> Are you absolutely certain that you have your drive mapping done properly? 
> The fact that the job fails 30 minutes after the initial mount attempt
> makes it sound like you are failing with a media mount time out.  The most
> common cause (especially with ACS environments) is a simple mismatch betwee
> the /dev/rmt path and your ACS path (i.e. ACS,LSM,PANEL,DRIVE).  The SL8500
> is also very difficult to address properly, since the ACS path has little
> correlation with the physical location of the drive.
> 
> Probably the quickest test you can perform is to verify that your jobs are
> being affected by the media mount timeout.  If you shorten the media mount
> timeout parameter, to say 10 minutes, your jobs should fail 10 minutes
> after they start if the mount timeout is what fails the jobs.
> 
> You should also track down which drives are failing to mount, and see if
> there is a correlation.
> 
>   Cheers
>   Mike
> 
> 
> >
> > Message: 7
> > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST)
> > From: Justin Piszcz <jpiszcz at lucidpixels.com>
> > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
> > To: veritas-bu at mailman.eng.auburn.edu
> > Message-ID: <Pine.LNX.4.64.0612081102150.15271 at p34.internal.lan>
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> >
> > All,
> >
> > My group is setting up two Sun/StorageTek SL8500s.  Sun did the
> > install of ACSLS, there were no problems on their side.  Each SL8500
> > is in its own environment.  On each SL8500, we have 8 media servers,
> > connected to four drives each, giving us a total of 32 drives.  For
> > testing, I did the following.  Ran a NON-MULTIPLEXED backup to each
> > drive, to ensure each drive worked properly.  To do this I kicked off
> > four jobs in succession. When I do this, I utilize all 4 drives.  I
> > did this with each media server without a single problem.  However,
> > when testing everything together, all 32 drives, I kick off 45 jobs
> > for example.  It says there are 32 active jobs in netbackup, which is
> > correct.  The problem is, randomly, 2 or 3 jobs will hang at
> > "Mounting MediaID.." and then the drive will go down after 30
> > minutes.  Why is this?  With an L700, I can send 500-1000 jobs to all
> > of the drives in it and there is never a mounting problem.  There is
> > nothing wrong with any of the drives, they are brand new.  I can use
> > ACSLS and dismount the media from the drives and then re-run my
> > earlier test backups, one at a time to each of the four drives
> > per-media server without any issues.  It is only when the robot
> > receives a 'burst' of jobs that this happens.
> >
> > Has anyone experienced anything like this before?
> >
> > Thanks for any help and responses,
> >
> > Justin.
> >
> >
> 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>