Library (and a tape drive) Disappearing

biscuitman

ADSM.ORG Member
Joined
Jul 5, 2007
Messages
12
Reaction score
0
Points
0
Location
Liverpool - England, UK.
Hi All – any help would be gratefully appreciated with this … long one so please be patient

Running TSM 5.5.2 on a Windows Server 2003 Standard Edition – SP 1. HP MSL4048 G3 Series LTO3(2 drives) Fibre attached tape library … all working OK until last 2 weeks …

Start to see SCSI Changer errors or possibly a drive issue – had hardware support out but they couldn’t find an issue with the library (they thought it could be a media issue – However, we did eventually get them to swap out the drive we thought was the issue).
Since then we have also changed the Fibre Card and used new fibre cables.

Done all the usual stuff – deleted and redefined the tape library from Windows and TSM – checked visible to Windows and that correct drives were being used etc. Re-checked all configuration options and they appear to be OK – starts work – migrates and reclaims (no direct to tape backups are configured all to disk)

Then we start seeing errors again – (dates and times don’t necessary follow on – also some may be irrelevant but just tried to get as much info out of the activity log as possible)


Error msgs etc in next post - can't get attachment to work - Sorry ...

At a loss now – have tried everything I know (not much Ha !!) – so any input will be appreciated.

Thanks in advance Ian.
 
Error msgs ..

Then we start seeing errors again – (dates and times don’t necessary follow on – also some may be irrelevant but just tried to get as much info out of the activity log as possible)

04/05/2011 16:08:28 ANR8300E I/O error on library LB124.1.0.3 (OP=8401C058, CC=304, KEY=05, ASC=30, ASCQ=12, SENSE=70.00.05.00.00.00.00.0A.00.00.00.00.30.12.00.00.00.00., Description=Changer failure). Refer to Appendix C in the 'Messages' manual for recommended action. (SESSION: 6, PROCESS: (this was the original msg we seen when we had hardware support out to site)

04/05/2011 17:54:44 ANR8311E An I/O error occurred while accessing drive MT124.0.0.3 (mt124.0.0.3) for LOCATE operation, errno = 1117. (SESSION: 5, PROCESS: 5)

04/06/2011 12:57:08 ANR8302E I/O error on drive MT124.0.0.3 (mt124.0.0.3) with volume ???(OP=WRITE, Error Number=1117, CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=SCSI adapter failure). Refer to Appendix C in the 'Messages' manual for recommended action. (PROCESS: 3)
04/06/2011 12:57:08 ANR0515I Process 3 closed volume ???. (PROCESS: 3)
04/06/2011 12:57:08 ANR1032W Migration process 3 terminated for storage pool BACKUPPOOL - internal server error detected. (PROCESS: 3)
04/06/2011 12:57:08 ANR9999D Thread<35> issued message 1032 from: (PROCESS: 3)
04/06/2011 12:57:08 ANR0986I Process 3 for MIGRATION running in the BACKGROUND processed 2180 items for a total of 417,451,139,072 bytes with a completion state of FAILURE at 12:57:08. (PROCESS: 3)
04/06/2011 12:57:08 ANR1002I Migration for storage pool BACKUPPOOL will be retried in 60 seconds.
04/06/2011 12:58:08 ANR1003I Migration retry delay ended; checking migration status for storage pool BACKUPPOOL.
04/06/2011 12:59:08 ANR8311E An I/O error occurred while accessing drive MT124.0.0.3 (mt124.0.0.3) for OFFL operation, errno = 55. (PROCESS: 3)
04/06/2011 13:06:08 ANR8840E Unable to open device lb124.1.0.3 with file handle 1117. (PROCESS: 3)
04/06/2011 13:06:08 ANR8469E Dismount of LTO volume ??? from drive MT124.0.0.3 (mt124.0.0.3) in library LB124.1.0.3 failed. (PROCESS: 3)

04/06/2011 22:53:48 ANR8840E Unable to open device lb124.1.0.3 with file handle 2. (PROCESS: 6)
04/06/2011 22:53:48 ANR8848W Drive MT124.0.0.4 of library LB124.1.0.3 is inaccessible; server has begun polling drive. (PROCESS: 6)
04/06/2011 22:54:18 ANR8840E Unable to open device lb124.1.0.3 with file handle 2. (PROCESS: 6)
04/06/2011 22:54:48 ANR8840E Unable to open device lb124.1.0.3 with file handle 2. (PROCESS: 6)

04/07/2011 11:18:17 ANR0984I Process 3 for MIGRATION started in the BACKGROUND at 11:18:17. (PROCESS: 3)
04/07/2011 11:18:17 ANR1000I Migration process 3 started for storage pool BACKUPPOOL automatically, highMig=90, lowMig=50, duration=No. (PROCESS: 3)
04/07/2011 11:18:17 ANR8840E Unable to open device lb124.1.0.3 with file handle 2. (PROCESS: 3)
04/07/2011 11:18:17 ANR8441E Initialization failed for SCSI library LB124.1.0.3. (PROCESS: 3)
04/07/2011 11:18:17 ANR1401W Mount request denied for volume ??? - mount failed. (PROCESS: 3)
04/07/2011 11:18:17 ANR8840E Unable to open device lb124.1.0.3 with file handle 2. (PROCESS: 3)


On checking Device Manager – it only shows a single tape drive – the library and 2nd drive have disappeared – all in TSM …

Here are some of the Windows Event Log Msgs …

A non check condition error has occurred on device Device\lb124.1.0.3 during Release with completion code DD_SCSI_ADAPTER_FAILURE
Dump Data; byte 0x2D=SRB Status, byte 0x2C=SCSI Status

A non check condition error has occurred on device Device\mt124.0..0.3 during Log Sense with completion code DD_SCSI_ADAPTER_FAILURE
Dump Data; byte 0x2D=SRB Status, byte 0x2C=SCSI Status

A check condition error has occurred on device \Device\lb124.1.0.3 during Move Medium with completion code DD_CHANGER_FAILURE
Dump Data: byte 0x3E=KEY, byte 0x3D=ASC, byte 0x3C=ASCQ

The device ‘HP Ultrium 3-SCSI SCSI Sequential Device’ (SCSI\Sequential&Ven_HP_Prod_Ultrium_3-SCSI&Rev_M63W\7&8c37x88&0&000) disappeared from the system without first being prepared for removal.

The device ‘IBM Tivoli Storage Manager for Medium Changers’ (SCSI\Changer&Ven_HP_Prod_MSL_G3_Series&Rev_7.00\7&2f2400f7&0&001) disappeared from the system without first being prepared for removal.


Of course there is much more but these last two struck a cord … when we view regedit they do not show any driver info and have also been missing their identifier.
 
Things to look at are, from a DOS box, run the TSMDLS.EXE or ITDT. See if windows sees all the drives and the library.
Are you using the TSM Device Driver and not the Device Driver for TSM? Non-IBM drives should use the TSM Device Driver.
Do you have persistence set on for the library and drives on the Fibre card?
 
Hi rallingham - thanks for the input ...
When we run TSMDLST.EXE - we see the library and drives - However, the library disappears when the issue occurs
before ...
TSM Name ID LUN Bus Port SSN WWN TSM Typ
e Device Identifier
--------------------------------------------------------------------------------
--------------------------------
mt123.0.0.3 123 0 0 3 - - LTO
HP Ultrium 3-SCSI M63W
lb123.1.0.3 123 1 0 3 - - LIBRARY
HP MSL G3 Series 7.00
mt124.0.0.4 124 0 0 4 - - LTO
HP Ultrium 3-SCSI M63W

after ...
TSM Name ID LUN Bus Port SSN WWN TSM Typ
e Device Identifier
--------------------------------------------------------------------------------
--------------------------------
mt123.0.0.3 123 0 0 3 - - LTO
HP Ultrium 3-SCSI M63W
mt124.0.0.4 124 0 0 4 - - LTO
HP Ultrium 3-SCSI M63W

Both outputs from tsmdlst.exe show TSM Device Driver: TSMScsi - Running

Not sure what you mean regarding having persistence set on ?? Could you advise where I should look or will it be a better bet to get the Intel techy to take a look/

Regards,
 
Depending on the fibre card you are running, Qlogic for example, they have an application for the qlogic card called SANSURFER.
When you run this application it allows you to set persistence on for the Tape drives. Quite often with windows the addresses will change unless you set this persistence to ON for the tape drives. Notice the errors and the address of the library from your initial post to the running of TSMDLST. Install the drivers, use SANSURFER and reboot the system. Everything should be there.
 
Again thanks for the quick response, rallingham ..

We swapped the QLogic card out for an Emulex - The Intel guys have left the office now - so will speak to them in the morning.

Regards,
Ian.
 
Just another note. It may be referred to as Persistent Binding and I think it is the HBA Anywhere utility that handles this. However I have never used the product. I almost always have customers using Qlogic cards. Also check to be sure you have RSM disabled on your windows server.
Try this site for more info: http://www.emulex.com/files/downloads/hardware/troubleshooting_basic.pdf
 
Last edited:
The utility is HBAnyware and I believe it's only for use with the lpfc driver. That will allow you to set Persistent Binding for your tape devices. I've not really used it. I worked with our Windows admin once many months ago on a test system, which soon after got scrapped, but HBAnyware is what he used for it.
 
Hi Andrey - I tried to use L&TT's but as the devices are using IBM Drivers it unfortunately doesn't want to work - However, the engineer who attended the remote site had a way and he advised that the library/drives etc were reporting OK.
 
Media changer drivers and IBM Library drivers

I had the same issue 6 months ago, it was after HP firware upgrade )
Have you upgraded the windows 2003 server or(firmware uprgade or patching)

if yes you have t o rollover the Media Changer drivers,
· Check the Medium Changer drivers by going to system properties-> hardware->device drivers if the drivers are changed do a roll back
then
· Reinstall the IBM library drivers
· Redefine the drives with new mt (mt numbers can be found by doing a tsmdlst)numbers and redefine the paths

Restart the Library then Restart the TSM server its should work.
 
Hi All - just to say Thanks for all your replies / ideas etc ... The problem appears to have been hardware related after all ...
We had also involved IBM Tivoli Support who advised that from error msgs / logs etc and previous issues logged by other customers it looked hardware related.

for info - what happened ...

When we first saw the errors we believed we had a drive problem - which was swapped out by our hardware support - the engineer wasn't convinced it was hardware related ... anyway we had a short run of success but then experienced the errors again - this is were we started to dig deeper and began to change other elements i.e. fibre card/cables etc. that msgs pointed to - but nothing changed - same probs again and again. Eventually we requested a full library swap out from our hardware support people - which they agreed to, as we couldn't afford to spend any more time swapping this bit then that - the issue was at our Head Office site and we needed to get backups completed and data offsite etc. Although with more luck than judgement we managed to get a backup and some tapes offsite using one drive only - hadn't even managed this previously due to the probs.
Anyway, a new library with one drive was brought to site and they installed the drive already supplied as the first replacement ... looked good - managed to get a migrate going and some small data moves to disk off tapes - Woohoo is this the end ----------- Arrghhh, the same probs when trying a tape to tape data move ...
Anyway the errors pointed to the replacement drive they had first supplied - so the engineer swapped this with the remaining drive in the other library (this is actually one of the original drives from library purchase) ... re-configured it all again ... looks good - but it did before too .... tried a migrate - worked OK / tried small data moves off tape to disk - worked OK / tried the tape to tape move - IT WORKED - HURRAY !!!! ... Mmmmn is this the end I wonder.
Checked in tapes ready for the backups ........ still looks OK ... We'll see ...
No probs seen after a full nights workload - will monitor closely for a few days yet - but really must have been a hardware error and the replacemnt drive must have been faulty too.
Thanks again for your input/ideas with this. Regards, Ian.
 
Back
Top