Veritas-bu

[Veritas-bu] Question posed to ACSLS/STK8500 users.

2006-12-08 15:20:44
Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
From: ml000-0001 at well-dunn.com (Mike Dunn (veritas-bu))
Date: Fri, 08 Dec 2006 14:20:44 -0600
Are you using UDP for communication with your acs server? (UDP is default).
 If is, try switching to TCP.

  Cheers
  Mike


On 2:16:50 pm 2006-12-08 Justin Piszcz <jpiszcz at lucidpixels.com> wrote:
> Nope, only 1 NIC.  And even so yeah I do specify that in the vm.conf
> just incase.
>
> Justin.
>
> On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote:
>
> >  Hmmm, do your media server's have multiple NIC, and are you using
> >  IP multipathing software? (like in.mpathd under Solaris)  If so,
> >  then make sure that you have set the ACS_SSI_HOSTNAME
> >  appropriately in your vm.conf file.  The acs daemon inserts the
> >  value (or inferred value) of ACS_SSI_HOSTNAME into all
> >  communications with the acs server.  Also, make sure that if you
> >  are using acls on the acs server, that they match the name/IP used
> >  in ACS_SSI_HOSTNAME.
> >    Cheers
> >    Mike
> >
> >
> >  On 1:43:52 pm 2006-12-08 Justin Piszcz <jpiszcz at lucidpixels.com>
> > >  wrote: It is 100% correct.  Yep.  I ran about 5 test backups to
> > >  each drive in the robot.  No problems.  It is only when there is
> > > a burst of jobs.
> > >  Justin.
> > >
> > >  On Fri, 8 Dec 2006, Mike Dunn (veritas-bu) wrote:
> > >
> > > >   Justin,
> > > >
> > > >   Are you absolutely certain that you have your drive mapping
> > > >   done properly? The fact that the job fails 30 minutes after
> > > >   the initial mount attempt makes it sound like you are failing
> > > >   with a media mount time out.  The most common cause
> > > >   (especially with ACS environments) is a simple mismatch
> > > >   betwee the /dev/rmt path and your ACS path (i.e.
> > > >   ACS,LSM,PANEL,DRIVE).  The SL8500 is also very difficult to
> > > >   address properly, since the ACS path has little correlation
> > > >   with the physical location of the drive. Probably the
> > > >   quickest test you can perform is to verify that your jobs are
> > > >   being affected by the media mount timeout.  If you shorten
> > > >   the media mount timeout parameter, to say 10 minutes, your
> > > >   jobs should fail 10 minutes after they start if the mount
> > > >   timeout is what fails the jobs. You should also track down
> > > >   which drives are failing to mount, and see if there is a
> > > > correlation.
> > > >     Cheers
> > > >     Mike
> > > >
> > > >
> > > > >
> > > > >   Message: 7
> > > > >   Date: Fri, 8 Dec 2006 11:08:39 -0500 (EST)
> > > > >   From: Justin Piszcz <jpiszcz at lucidpixels.com>
> > > > >   Subject: [Veritas-bu] Question posed to ACSLS/STK8500 users.
> > > > >   To: veritas-bu at mailman.eng.auburn.edu
> > > > >   Message-ID: <Pine.LNX.4.64.0612081102150.15271 at p34.internal.
> lan>
> > > > >   Content-Type: TEXT/PLAIN; charset=US-ASCII
> > > > >
> > > > >   All,
> > > > >
> > > > >   My group is setting up two Sun/StorageTek SL8500s.  Sun did
> > > > >   the install of ACSLS, there were no problems on their side.
> > > > >   Each SL8500 is in its own environment.  On each SL8500, we
> > > > >   have 8 media servers, connected to four drives each, giving
> > > > >   us a total of 32 drives.  For testing, I did the following.
> > > > >   Ran a NON-MULTIPLEXED backup to each drive, to ensure each
> > > > >   drive worked properly.  To do this I kicked off four jobs in
> > > > >   succession. When I do this, I utilize all 4 drives.  I did
> > > > >   this with each media server without a single problem.
> > > > >   However, when testing everything together, all 32 drives, I
> > > > >   kick off 45 jobs for example.  It says there are 32 active
> > > > >   jobs in netbackup, which is correct.  The problem is,
> > > > >   randomly, 2 or 3 jobs will hang at "Mounting MediaID.." and
> > >  then the drive will go down after 30 minutes.  Why is this?
> > > > >   With an L700, I can send 500-1000 jobs to all of the drives
> > > > >   in it and there is never a mounting problem.  There is
> > > > >   nothing wrong with any of the drives, they are brand new.
> > > > >   I can use ACSLS and dismount the media from the drives and
> > > > >   then re-run my earlier test backups, one at a time to each
> > > > >   of the four drives per-media server without any issues.  It
> > > > >  is only when the robot receives a 'burst' of jobs that this
> > > > >   happens. Has anyone experienced anything like this before?
> > > > >
> > > > >   Thanks for any help and responses,
> > > > >
> > > > >   Justin.
> > > > >
> > > > >
> > > >
> > > >   _______________________________________________
> > > >   Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> > > >   http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu