Veritas-bu

[Veritas-bu] drives going down on media servers

2006-01-26 11:21:23
Subject: [Veritas-bu] drives going down on media servers
From: pkeating AT bank-banque-canada DOT ca (Paul Keating)
Date: Thu, 26 Jan 2006 11:21:23 -0500
Sounds like SSO is misbehaving.



> -----Original Message-----
> From: veritas-bu-admin AT mailman.eng.auburn DOT edu 
> [mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of 
> Blaine Robison
> Sent: January 26, 2006 11:17 AM
> To: dave.markham AT fjserv DOT net; Roger Dombrowski
> Cc: blaine_robison AT yahoo DOT com; veritas-bu AT mailman.eng.auburn DOT edu
> Subject: Re: [Veritas-bu] drives going down on media servers
> 
> 
> thanks for your input, I have already disabled the RSM on the master. 
> Let me give the entire rundown of the system. 
> 
> Master Server 
> win 2000 NBU 5.1 MP4
> 
> Media Servers
> 2 Sun 480 Sol9 qlogic cards with Leadville drivers. 
> 
> Master and media servers are on their own internal Gb 
> network. they are SAN
> attaced to the tape drives using Brocade switches. the drives 
> are IBM and HP
> LTO2. 
> 
> when i run the backups on 1 media server the backups run 
> fine. When I try to
> share the drives and run both master servers I get Permission 
> denied in the
> messages files then 84,85 errors. In the bptm log I see the 
> external event
> caused rewind. After talking with STK they told me the drive 
> was getting
> inquiries from another system during the backup. I am lead to 
> believe that the
> SCSI reserve is not being handled properly between the 
> servers. since the SCSI
> reserve is supposed to be initiated when the drive is opened 
> I would think it
> would not except any inquiries or SCSI commands until it was closed. 
> 
> My conclusion is the Leadville HBA drivers are not handling 
> the SCSI reserve
> properly. But Sun says there is no problem call Veritas. 
> Veritas tells me the
> error is given by io_ctl in the OS call Sun. 
> 
> thanks for your input it is nice not to be all alone in this. 
> 
> 
> 
> 
> --- Dave Markham <dave.markham AT fjserv DOT net> wrote:
> 
> > Unbelievably i have seen this yesterday as a windows guy 
> asked me if i 
> > knew about it seeing as i support Netbackup on solaris.
> > 
> > The fix he got which worked was to disable the Removable 
> storage manager 
> > service. The errors are no more.
> > 
> > That was on a windows 2003 setup with netbackmup 5.1 mp4
> > 
> > Roger Dombrowski wrote:
> > 
> > > Hi Blaine,
> > >
> > > I have been looking to try and solve this problem for two 
> sites that 
> > > I'm working with right now
> > > and we're not having much luck either.  In my travels 
> I've talked to a 
> > > few folks that have seen
> > > this "External Event" issue caused by monitoring 
> software. One client 
> > > in particular found that one
> > > of Sun's monitoring tools was sending out scsi inquiries 
> and causing 
> > > the "external event rewinds".
> > >
> > > I also ran across a post on this mailing list that 
> documents about 30 
> > > such applications that have
> > > been known to cause this type of behaviour.  Try 
> searching this list 
> > > for "external event". If a get
> > > a chance, I'll try and dig it up and send you the post 
> I'm thinking of.
> > >
> > > Through the course of my research I've basically found 
> that two things 
> > > are trying to communicate
> > > with the drive and most folks check out the data path (hba's, 
> > > switches, bridges,...) to look for problems.
> > >
> > > Maybe the upgrade stepped on some scsi reservation 
> setting. If I find 
> > > anything else, I'll post to the
> > > list...
> > >
> > > Blaine Robison wrote:
> > >
> > >> I am having a similar issue. I have a windows 2000 
> master and a pair 
> > >> of sun
> > >> 480's with 8 LTO2 drives shared between them. I get 
> External Event 
> > >> caused
> > >> rewind error and the tapes get frozen or the drives go 
> down. I didn't 
> > >> have the
> > >> problem unti lI upgraded to 5.1 MP4. I have gone over the entire 
> > >> configuration
> > >> and cannot find a problem.
> > >> Has anyone else seen this and found a resolution?  
> > >> --- ida3248b AT post.cybercity DOT dk wrote:
> > >>
> > >>  
> > >>
> > >>> Have you tried /var/adm/messages (Solaris) or the 
> equivalent log ?
> > >>>
> > >>> Regards
> > >>> Michael
> > >>>
> > >>> On Wed, 18 Jan 2006 15:00:24 +0000, Dave Markham wrote
> > >>>   
> > >>>
> > >>>> I have 1 master server, and 2 media servers connected 
> over fiber to 
> > >>>> an L700. Im not sure what the switch in the middle is as didnt 
> > >>>> install the system or have any info on it.
> > >>>>
> > >>>> There are 5 drives in the L700 and 3 of them are 
> shared with sso 
> > >>>> option to the master, and both media servers.
> > >>>>
> > >>>> People i have had an issue lately with drives being 
> not visible to 
> > >>>> one of my media servers.
> > >>>>
> > >>>> I have fixed this by unloading the fibre hba using cfgadm and 
> > >>>> loading it again. It then can see the devices under 
> sgscan and has 
> > >>>> seen them under /dev/rmt
> > >>>>
> > >>>> I also noticed the customer had removed a /etc/hosts 
> entry for the 
> > >>>> media servers to talk to each other by the correct 
> name so i put 
> > >>>> that back in and can now talk on port 13701 to each 
> machine in the 
> > >>>> nbu setup.
> > >>>>
> > >>>> Whats happening now though is drives just keep going 
> down on the 
> > >>>> media servers and backups are not working. I have ITC 
> enabled so 
> > >>>> each media server needs to lock 2 drives.
> > >>>>
> > >>>> I have looked the bptm logs and cant see anything 
> jumping out apart 
> > >>>> from many request medias of different tape ids. I have 
> looked in 
> > >>>> /usr/openv/volmgr/debug/ltid/ and the logs in their show 
> > >>>> successfully on communicating shared drive info to the master.
> > >>>>
> > >>>> Therefore i am now stuck and have no idea whats going wrong :(
> > >>>>
> > >>>> Anyone any advice/pointers? Is ether anything specific 
> i should be 
> > >>>> looking for in the logs or are there other important 
> logs im not 
> > >>>> checking.
> > >>>>
> > >>>> Thanks
> > >>>> _______________________________________________
> > >>>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > >>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >>>>     
> > >>>
> > >>> -- 
> > >>> Cybercity Webhosting (http://www.cybercity.dk)
> > >>>
> > >>> _______________________________________________
> > >>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >>>
> > >>>   
> > >>
> > >>
> > >>
> > >> Blaine Robison
> > >> Solaris Ceritfied System Administrator Solaris Certified Network 
> > >> Administrator
> > >> Veritas Certified Professional
> > >> 972-853-2459
> > >> 214-578-5391
> > >>
> > >> __________________________________________________
> > >> Do You Yahoo!?
> > >> Tired of spam?  Yahoo! Mail has the best spam protection around 
> > >> http://mail.yahoo.com 
> _______________________________________________
> > >> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > >> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> > >>
> > >>  
> > >>
> > >
> > >
> > 
> > 
> 
> 
> Blaine Robison
> Solaris Ceritfied System Administrator 
> Solaris Certified Network Administrator
> Veritas Certified Professional
> 972-853-2459
> 214-578-5391
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> 


<Prev in Thread] Current Thread [Next in Thread>