Veritas-bu

[Veritas-bu] drives going down on media servers

2006-01-26 11:17:21
Subject: [Veritas-bu] drives going down on media servers
From: blaine_robison AT yahoo DOT com (Blaine Robison)
Date: Thu, 26 Jan 2006 08:17:21 -0800 (PST)
thanks for your input, I have already disabled the RSM on the master. 
Let me give the entire rundown of the system. 

Master Server 
win 2000 NBU 5.1 MP4

Media Servers
2 Sun 480 Sol9 qlogic cards with Leadville drivers. 

Master and media servers are on their own internal Gb network. they are SAN
attaced to the tape drives using Brocade switches. the drives are IBM and HP
LTO2. 

when i run the backups on 1 media server the backups run fine. When I try to
share the drives and run both master servers I get Permission denied in the
messages files then 84,85 errors. In the bptm log I see the external event
caused rewind. After talking with STK they told me the drive was getting
inquiries from another system during the backup. I am lead to believe that the
SCSI reserve is not being handled properly between the servers. since the SCSI
reserve is supposed to be initiated when the drive is opened I would think it
would not except any inquiries or SCSI commands until it was closed. 

My conclusion is the Leadville HBA drivers are not handling the SCSI reserve
properly. But Sun says there is no problem call Veritas. Veritas tells me the
error is given by io_ctl in the OS call Sun. 

thanks for your input it is nice not to be all alone in this. 




--- Dave Markham <dave.markham AT fjserv DOT net> wrote:

> Unbelievably i have seen this yesterday as a windows guy asked me if i 
> knew about it seeing as i support Netbackup on solaris.
> 
> The fix he got which worked was to disable the Removable storage manager 
> service. The errors are no more.
> 
> That was on a windows 2003 setup with netbackmup 5.1 mp4
> 
> Roger Dombrowski wrote:
> 
> > Hi Blaine,
> >
> > I have been looking to try and solve this problem for two sites that 
> > I'm working with right now
> > and we're not having much luck either.  In my travels I've talked to a 
> > few folks that have seen
> > this "External Event" issue caused by monitoring software. One client 
> > in particular found that one
> > of Sun's monitoring tools was sending out scsi inquiries and causing 
> > the "external event rewinds".
> >
> > I also ran across a post on this mailing list that documents about 30 
> > such applications that have
> > been known to cause this type of behaviour.  Try searching this list 
> > for "external event". If a get
> > a chance, I'll try and dig it up and send you the post I'm thinking of.
> >
> > Through the course of my research I've basically found that two things 
> > are trying to communicate
> > with the drive and most folks check out the data path (hba's, 
> > switches, bridges,...) to look for problems.
> >
> > Maybe the upgrade stepped on some scsi reservation setting. If I find 
> > anything else, I'll post to the
> > list...
> >
> > Blaine Robison wrote:
> >
> >> I am having a similar issue. I have a windows 2000 master and a pair 
> >> of sun
> >> 480's with 8 LTO2 drives shared between them. I get External Event 
> >> caused
> >> rewind error and the tapes get frozen or the drives go down. I didn't 
> >> have the
> >> problem unti lI upgraded to 5.1 MP4. I have gone over the entire 
> >> configuration
> >> and cannot find a problem.
> >> Has anyone else seen this and found a resolution?  
> >> --- ida3248b AT post.cybercity DOT dk wrote:
> >>
> >>  
> >>
> >>> Have you tried /var/adm/messages (Solaris) or the equivalent log ?
> >>>
> >>> Regards
> >>> Michael
> >>>
> >>> On Wed, 18 Jan 2006 15:00:24 +0000, Dave Markham wrote
> >>>   
> >>>
> >>>> I have 1 master server, and 2 media servers connected over fiber to 
> >>>> an L700. Im not sure what the switch in the middle is as didnt 
> >>>> install the system or have any info on it.
> >>>>
> >>>> There are 5 drives in the L700 and 3 of them are shared with sso 
> >>>> option to the master, and both media servers.
> >>>>
> >>>> People i have had an issue lately with drives being not visible to 
> >>>> one of my media servers.
> >>>>
> >>>> I have fixed this by unloading the fibre hba using cfgadm and 
> >>>> loading it again. It then can see the devices under sgscan and has 
> >>>> seen them under /dev/rmt
> >>>>
> >>>> I also noticed the customer had removed a /etc/hosts entry for the 
> >>>> media servers to talk to each other by the correct name so i put 
> >>>> that back in and can now talk on port 13701 to each machine in the 
> >>>> nbu setup.
> >>>>
> >>>> Whats happening now though is drives just keep going down on the 
> >>>> media servers and backups are not working. I have ITC enabled so 
> >>>> each media server needs to lock 2 drives.
> >>>>
> >>>> I have looked the bptm logs and cant see anything jumping out apart 
> >>>> from many request medias of different tape ids. I have looked in 
> >>>> /usr/openv/volmgr/debug/ltid/ and the logs in their show 
> >>>> successfully on communicating shared drive info to the master.
> >>>>
> >>>> Therefore i am now stuck and have no idea whats going wrong :(
> >>>>
> >>>> Anyone any advice/pointers? Is ether anything specific i should be 
> >>>> looking for in the logs or are there other important logs im not 
> >>>> checking.
> >>>>
> >>>> Thanks
> >>>> _______________________________________________
> >>>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> >>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >>>>     
> >>>
> >>> -- 
> >>> Cybercity Webhosting (http://www.cybercity.dk)
> >>>
> >>> _______________________________________________
> >>> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >>>
> >>>   
> >>
> >>
> >>
> >> Blaine Robison
> >> Solaris Ceritfied System Administrator Solaris Certified Network 
> >> Administrator
> >> Veritas Certified Professional
> >> 972-853-2459
> >> 214-578-5391
> >>
> >> __________________________________________________
> >> Do You Yahoo!?
> >> Tired of spam?  Yahoo! Mail has the best spam protection around 
> >> http://mail.yahoo.com _______________________________________________
> >> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> >> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> >>
> >>  
> >>
> >
> >
> 
> 


Blaine Robison
Solaris Ceritfied System Administrator 
Solaris Certified Network Administrator
Veritas Certified Professional
972-853-2459
214-578-5391

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com