Veritas-bu

[Veritas-bu] Question posed to ACSLS/STK8500 users.

2006-12-08 18:28:48
Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
From: hampus.lind at rps.police.se (Hampus Lind)
Date: Sat, 9 Dec 2006 00:28:48 +0100
And you see now errors in acsss_event.log on the ACSLS server when the
drives get downed? 

Hampus Lind
Rikspolisstyrelsen
National Police Board
Tel dir: +46 (0)8 - 401 99 43
Tel mob: +46 (0)70 - 217 92 66
E-mail: hampus.lind at rps.police.se


-----Ursprungligt meddelande-----
Fr?n: veritas-bu-bounces at mailman.eng.auburn.edu
[mailto:veritas-bu-bounces at mailman.eng.auburn.edu] F?r Justin Piszcz
Skickat: den 8 december 2006 21:05
Till: Hall, Christian N.
Kopia: Mike Dunn (veritas-bu); veritas-bu at mailman.eng.auburn.edu
?mne: Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.

Yes everything matches perfectly.  Remember, if I run the backups slowly, 
one at a time, I can see each of the 4 drives being used per each media 
server.  When I run a burst of jobs though, 29-30 of them work (1 tape per 
each drive) and a RANDOM 2-3 drives do not work (it differs each time I do 
it)..

Currently I am not using MPX so Ican easily test, ie 1 job = 1 tape drive.

Justin.

On Fri, 8 Dec 2006, Hall, Christian N. wrote:

> Justin,
> 
> Do the ACSLS,LSM,PANEL,DRIVE NUMBER for ACSLS match serial number
> results from the tpautconf -t on the master server /dev/rmt/*cbn?
> Can you please display the output? Did you perform this test from your
> master server, or did you perform this test from each host that are
> media servers? After you attempt your multi-plexing do you have stuck
> tapes? 
> 
> Chris  
> 
> -----Original Message-----
> From: veritas-bu-bounces at mailman.eng.auburn.edu
> [mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of Justin
> Piszcz
> Sent: Friday, December 08, 2006 2:44 PM
> To: Mike Dunn (veritas-bu)
> Cc: veritas-bu at mailman.eng.auburn.edu
> Subject: Re: [Veritas-bu] Question posed to ACSLS/STK8500 users.
> 
> It is 100% correct.  Yep.  I ran about 5 test backups to each drive in
> the robot.  No problems.  It is only when there is a burst of jobs.
> 
> Justin.
> 
> On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote:
> 
> > Justin,
> > 
> > Are you absolutely certain that you have your drive mapping done
> properly? 
> > The fact that the job fails 30 minutes after the initial mount attempt
> 
> > makes it sound like you are failing with a media mount time out.  The 
> > most common cause (especially with ACS environments) is a simple 
> > mismatch betwee the /dev/rmt path and your ACS path (i.e. 
> > ACS,LSM,PANEL,DRIVE).  The SL8500 is also very difficult to address 
> > properly, since the ACS path has little correlation with the physical
> location of the drive.
> > 
> > Probably the quickest test you can perform is to verify that your jobs
> 
> > are being affected by the media mount timeout.  If you shorten the 
> > media mount timeout parameter, to say 10 minutes, your jobs should 
> > fail 10 minutes after they start if the mount timeout is what fails
> the jobs.
> > 
> > You should also track down which drives are failing to mount, and see 
> > if there is a correlation.
> > 
> >   Cheers
> >   Mike
> > 
> > 
> > >
> > > Message: 7
> > > Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST)
> > > From: Justin Piszcz <jpiszcz at lucidpixels.com>
> > > Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
> > > To: veritas-bu at mailman.eng.auburn.edu
> > > Message-ID: <Pine.LNX.4.64.0612081102150.15271 at p34.internal.lan>
> > > Content-Type: TEXT/PLAIN; charset=US-ASCII
> > >
> > > All,
> > >
> > > My group is setting up two Sun/StorageTek SL8500s.  Sun did the 
> > > install of ACSLS, there were no problems on their side.  Each SL8500
> 
> > > is in its own environment.  On each SL8500, we have 8 media servers,
> 
> > > connected to four drives each, giving us a total of 32 drives.  For 
> > > testing, I did the following.  Ran a NON-MULTIPLEXED backup to each 
> > > drive, to ensure each drive worked properly.  To do this I kicked 
> > > off four jobs in succession. When I do this, I utilize all 4 drives.
> 
> > > I did this with each media server without a single problem.  
> > > However, when testing everything together, all 32 drives, I kick off
> 
> > > 45 jobs for example.  It says there are 32 active jobs in netbackup,
> 
> > > which is correct.  The problem is, randomly, 2 or 3 jobs will hang 
> > > at "Mounting MediaID.." and then the drive will go down after 30 
> > > minutes.  Why is this?  With an L700, I can send 500-1000 jobs to 
> > > all of the drives in it and there is never a mounting problem.  
> > > There is nothing wrong with any of the drives, they are brand new.  
> > > I can use ACSLS and dismount the media from the drives and then 
> > > re-run my earlier test backups, one at a time to each of the four 
> > > drives per-media server without any issues.  It is only when the 
> > > robot receives a 'burst' of jobs that this happens.
> > >
> > > Has anyone experienced anything like this before?
> > >
> > > Thanks for any help and responses,
> > >
> > > Justin.
> > >
> > >
> > 
> > _______________________________________________
> > Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu 
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> 
_______________________________________________
Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu