[Veritas-bu] drives going down on media servers
2006-01-26 11:21:23
Sounds like SSO is misbehaving.
> -----Original Message-----
> From: veritas-bu-admin AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of
> Blaine Robison
> Sent: January 26, 2006 11:17 AM
> To: dave.markham AT fjserv DOT net; Roger Dombrowski
> Cc: blaine_robison AT yahoo DOT com; veritas-bu AT mailman.eng.auburn DOT edu
> Subject: Re: [Veritas-bu] drives going down on media servers
>
>
> thanks for your input, I have already disabled the RSM on the master.
> Let me give the entire rundown of the system.
>
> Master Server
> win 2000 NBU 5.1 MP4
>
> Media Servers
> 2 Sun 480 Sol9 qlogic cards with Leadville drivers.
>
> Master and media servers are on their own internal Gb
> network. they are SAN
> attaced to the tape drives using Brocade switches. the drives
> are IBM and HP
> LTO2.
>
> when i run the backups on 1 media server the backups run
> fine. When I try to
> share the drives and run both master servers I get Permission
> denied in the
> messages files then 84,85 errors. In the bptm log I see the
> external event
> caused rewind. After talking with STK they told me the drive
> was getting
> inquiries from another system during the backup. I am lead to
> believe that the
> SCSI reserve is not being handled properly between the
> servers. since the SCSI
> reserve is supposed to be initiated when the drive is opened
> I would think it
> would not except any inquiries or SCSI commands until it was closed.
>
> My conclusion is the Leadville HBA drivers are not handling
> the SCSI reserve
> properly. But Sun says there is no problem call Veritas.
> Veritas tells me the
> error is given by io_ctl in the OS call Sun.
>
> thanks for your input it is nice not to be all alone in this.
>
>
>
>
> --- Dave Markham <dave.markham AT fjserv DOT net> wrote:
>
> > Unbelievably i have seen this yesterday as a windows guy
> asked me if i
> > knew about it seeing as i support Netbackup on solaris.
> >
> > The fix he got which worked was to disable the Removable
> storage manager
> > service. The errors are no more.
> >
> > That was on a windows 2003 setup with netbackmup 5.1 mp4
> >
> > Roger Dombrowski wrote:
> >
> > > Hi Blaine,
> > >
> > > I have been looking to try and solve this problem for two
> sites that
> > > I'm working with right now
> > > and we're not having much luck either. In my travels
> I've talked to a
> > > few folks that have seen
> > > this "External Event" issue caused by monitoring
> software. One client
> > > in particular found that one
> > > of Sun's monitoring tools was sending out scsi inquiries
> and causing
> > > the "external event rewinds".
> > >
> > > I also ran across a post on this mailing list that
> documents about 30
> > > such applications that have
> > > been known to cause this type of behaviour. Try
> searching this list
> > > for "external event". If a get
> > > a chance, I'll try and dig it up and send you the post
> I'm thinking of.
> > >
> > > Through the course of my research I've basically found
> that two things
> > > are trying to communicate
> > > with the drive and most folks check out the data path (hba's,
> > > switches, bridges,...) to look for problems.
> > >
> > > Maybe the upgrade stepped on some scsi reservation
> setting. If I find
> > > anything else, I'll post to the
> > > list...
> > >
> > > Blaine Robison wrote:
> > >
> > >> I am having a similar issue. I have a windows 2000
> master and a pair
> > >> of sun
> > >> 480's with 8 LTO2 drives shared between them. I get
> External Event
> > >> caused
> > >> rewind error and the tapes get frozen or the drives go
> down. I didn't
> > >> have the
> > >> problem unti lI upgraded to 5.1 MP4. I have gone over the entire
> > >> configuration
> > >> and cannot find a problem.
> > >> Has anyone else seen this and found a resolution?
> > >> --- ida3248b AT post.cybercity DOT dk wrote:
> > >>
> > >>
> > >>
> > >>> Have you tried /var/adm/messages (Solaris) or the
> equivalent log ?
> > >>>
> > >>> Regards
> > >>> Michael
> > >>>
> > >>> On Wed, 18 Jan 2006 15:00:24 +0000, Dave Markham wrote
> > >>>
> > >>>
> > >>>> I have 1 master server, and 2 media servers connected
> over fiber to
> > >>>> an L700. Im not sure what the switch in the middle is as didnt
> > >>>> install the system or have any info on it.
> > >>>>
> > >>>> There are 5 drives in the L700 and 3 of them are
> shared with sso
> > >>>> option to the master, and both media servers.
> > >>>>
> > >>>> People i have had an issue lately with drives being
> not visible to
> > >>>> one of my media servers.
> > >>>>
> > >>>> I have fixed this by unloading the fibre hba using cfgadm and
> > >>>> loading it again. It then can see the devices under
> sgscan and has
> > >>>> seen them under /dev/rmt
> > >>>>
> > >>>> I also noticed the customer had removed a /etc/hosts
> entry for the
> > >>>> media servers to talk to each other by the correct
> name so i put
> > >>>> that back in and can now talk on port 13701 to each
> machine in the
> > >>>> nbu setup.
> > >>>>
> > >>>> Whats happening now though is drives just keep going
> down on the
> > >>>> media servers and backups are not working. I have ITC
> enabled so
> > >>>> each media server needs to lock 2 drives.
> > >>>>
> > >>>> I have looked the bptm logs and cant see anything
> jumping out apart
> > >>>> from many request medias of different tape ids. I have
> looked in
> > >>>> /usr/openv/volmgr/debug/ltid/ and the logs in their show
> > >>>> successfully on communicating shared drive info to the master.
> > >>>>
> > >>>> Therefore i am now stuck and have no idea whats going wrong :(
> > >>>>
> > >>>> Anyone any advice/pointers? Is ether anything specific
> i should be
> > >>>> looking for in the logs or are there other important
> logs im not
> > >>>> checking.
> > >>>>
> > >>>> Thanks
> > >>>> _______________________________________________
> > >>>> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > >>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >>>>
> > >>>
> > >>> --
> > >>> Cybercity Webhosting (http://www.cybercity.dk)
> > >>>
> > >>> _______________________________________________
> > >>> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >> Blaine Robison
> > >> Solaris Ceritfied System Administrator Solaris Certified Network
> > >> Administrator
> > >> Veritas Certified Professional
> > >> 972-853-2459
> > >> 214-578-5391
> > >>
> > >> __________________________________________________
> > >> Do You Yahoo!?
> > >> Tired of spam? Yahoo! Mail has the best spam protection around
> > >> http://mail.yahoo.com
> _______________________________________________
> > >> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> > >> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >>
> > >>
> > >>
> > >
> > >
> >
> >
>
>
> Blaine Robison
> Solaris Ceritfied System Administrator
> Solaris Certified Network Administrator
> Veritas Certified Professional
> 972-853-2459
> 214-578-5391
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Veritas-bu maillist - Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Veritas-bu] drives going down on media servers,
Paul Keating <=
|
|
|