Veritas-bu

[Veritas-bu] Tape drive woes - FIXED!

2007-03-14 11:18:38
Subject: [Veritas-bu] Tape drive woes - FIXED!
From: Jason.Ellis at indymacbank.com (Ellis, Jason)
Date: Wed, 14 Mar 2007 08:18:38 -0700
Well, the problem was fixed and it was a problem with the library, but
it wasn't a problem with the drive firmware. During the maintenance two
of our drives had failed their tests, so the STK engineer powered the
drives off and recommended we should not use them until we replace them.
After he left we have some problems bringing the drives back online
because of the firmware update, which required us to reboot the
environment and re-install the tape device drivers. 

 

During this the library was reset and the problem was eventually traced
back to a "feature" of the STK library. With the two drives powered off,
the library re-enumerated the robotic drives, skipping over the powered
off drives. Windows and NetBackup had the correct device paths to the
tape drives, hence why the drives would go up and stay up, that is until
a backup was attempted. NetBackup would tell the robot to load the tape
into drive number "X" but because the robot had re-enumerated the
drives, it would never load the tape because the drive did not exist, or
it would load the tape into the wrong drive and NetBackup would complain
because it couldn't open the tape drive it expected the tape to be in.

 

After replacing the two downed drives, resetting the library and
cleaning up a few device config settings in NetBackup, everything seems
to be back up and running just fine now.

 

Thanks for all the help and adice!

 

Jason Ellis
Technical Consultant, Data Protection Team
IndyMac Bank, La Mirada Datacenter
Phone: (714) 520-3414
Mobile: (714) 889-8734

________________________________

From: WEAVER, Simon (external) [mailto:simon.weaver at astrium.eads.net] 
Sent: Wednesday, March 14, 2007 12:55 AM
To: Ellis, Jason; veritas-bu at mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Tape drive woes

 

Jason

If the firmware has been done, then that is most likely all of your
troubles! I certainly have been down this road before, and took ages to
finally resolve, but it was a case of reconfiguring all the drives
again, and in fact updating the firmware for not just the drives, but
the robot as well (in my case, this solved my problems).

 

Real pain when it happens .... but a relief when its sorted !

Regards

Simon Weaver
3rd Line Technical Support
Windows Domain Administrator 

EADS Astrium Limited, B23AA IM (DCS)
Anchorage Road, Portsmouth, PO3 5PU

Email: Simon.Weaver at Astrium-eads.net
<mailto:Simon.Weaver at Astrium-eads.net> 

        -----Original Message-----
        From: Ellis, Jason [mailto:Jason.Ellis at indymacbank.com] 
        Sent: 13 March 2007 16:24
        To: veritas-bu at mailman.eng.auburn.edu
        Subject: [Veritas-bu] Tape drive woes

        We have been having some problems with some of our tape drives
and believe to have narrowed it down to a problem with the drives
themselves. Our current environment is a single Master server running
Windows 2003 Enterprise SP1 and four Media servers running Windows 2003
Enterprise SP1. All our servers are running NetBackup 5.1 MP4.

         

        About a week ago Sun (STK) was onsite to perform maintenance. As
part of the maintenance they performed testing on our drives and updated
the drive firmware to the latest revision. However, since then a number
of our drives have not been staying up. After spending several days
troubleshooting the issue from an OS and NetBackup perspective we seem
to have narrowed the problem down the drives. The behavior is that we
can UP the drives in NetBackup and they will stay up until NetBackup
attempts to mount a tape and run a backup job to the drive. These drives
are shared among two Media servers, and it seems that the device host
they go DOWN on is consistent with the Media server that initiates the
backup job to the drive.

         

        Looking through the Application Event Log I've noticed the
following errors:

         

        1.      TLD(0) drive 18 (device 9) is being DOWNED, status:
Unable to open drive 
        2.      TLD(0) drive 15 (device 7) is being DOWNED, status:
Unable to SCSI unload drive 
        3.      TLD(0) drive 19 (device 10) is being DOWNED, status:
Drive does not exist in robot 

         

        Number two looks like a problem with the drive itself, however
numbers one and three almost looking more like a device path problem
within Netbackup or Windows.

         

        In Device Manager we can see the proper number of drives and all
the drives have the proper VERITAS drivers installed. Additionally
running a 'tpautoconf -t' and a 'scan' will show the proper number of
drives. We also double-checked the SSO configuration and corrected a few
errors, but this did not seem to have an effect on the problem.

         

        Has anybody else seen anything like this before? If you need
more information on our environment let me know. Thank you in advanced
for any help you can provide.

         

        Jason Ellis
        Technical Consultant, Data Protection Team
        IndyMac Bank, La Mirada Datacenter
        Phone: (714) 520-3414
        Mobile: (714) 889-8734

         

This email (including any attachments) may contain confidential and/or
privileged information or information otherwise protected from
disclosure.
If you are not the intended recipient, please notify the sender
immediately, do not copy this message or any attachments and do not use
it for any purpose or disclose its content to any person, but delete
this message and any attachments from your system.
Astrium disclaims any and all liability if this email transmission was
virus corrupted, altered or falsified.
---------------------------------------------------------------------
Astrium Limited, Registered in England and Wales No. 2449259
Registered Office: Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2AS,
England
        
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.eng.auburn.edu/pipermail/veritas-bu/attachments/20070314/0e048a4d/attachment-0001.htm