Veritas-bu

[Veritas-bu] Question posed to ACSLS/STK8500 users.

2006-12-08 12:59:28
Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
From: ml000-0001 at well-dunn.com (Mike Dunn (veritas-bu))
Date: Fri, 08 Dec 2006 11:59:28 -0600
Justin,

Are you absolutely certain that you have your drive mapping done properly? 
The fact that the job fails 30 minutes after the initial mount attempt
makes it sound like you are failing with a media mount time out.  The most
common cause (especially with ACS environments) is a simple mismatch betwee
the /dev/rmt path and your ACS path (i.e. ACS,LSM,PANEL,DRIVE).  The SL8500
is also very difficult to address properly, since the ACS path has little
correlation with the physical location of the drive.

Probably the quickest test you can perform is to verify that your jobs are
being affected by the media mount timeout.  If you shorten the media mount
timeout parameter, to say 10 minutes, your jobs should fail 10 minutes
after they start if the mount timeout is what fails the jobs.

You should also track down which drives are failing to mount, and see if
there is a correlation.

  Cheers
  Mike


>
> Message: 7
> Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST)
> From: Justin Piszcz <jpiszcz at lucidpixels.com>
> Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
> To: veritas-bu at mailman.eng.auburn.edu
> Message-ID: <Pine.LNX.4.64.0612081102150.15271 at p34.internal.lan>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> All,
>
> My group is setting up two Sun/StorageTek SL8500s.  Sun did the
> install of ACSLS, there were no problems on their side.  Each SL8500
> is in its own environment.  On each SL8500, we have 8 media servers,
> connected to four drives each, giving us a total of 32 drives.  For
> testing, I did the following.  Ran a NON-MULTIPLEXED backup to each
> drive, to ensure each drive worked properly.  To do this I kicked off
> four jobs in succession. When I do this, I utilize all 4 drives.  I
> did this with each media server without a single problem.  However,
> when testing everything together, all 32 drives, I kick off 45 jobs
> for example.  It says there are 32 active jobs in netbackup, which is
> correct.  The problem is, randomly, 2 or 3 jobs will hang at
> "Mounting MediaID.." and then the drive will go down after 30
> minutes.  Why is this?  With an L700, I can send 500-1000 jobs to all
> of the drives in it and there is never a mounting problem.  There is
> nothing wrong with any of the drives, they are brand new.  I can use
> ACSLS and dismount the media from the drives and then re-run my
> earlier test backups, one at a time to each of the four drives
> per-media server without any issues.  It is only when the robot
> receives a 'burst' of jobs that this happens.
>
> Has anyone experienced anything like this before?
>
> Thanks for any help and responses,
>
> Justin.
>
>