ADSM-L

3590 drives keep going offline

2004-02-10 23:53:58
Subject: 3590 drives keep going offline
From: "French, Michael" <Michael.French AT SAVVIS DOT NET>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 10 Feb 2004 22:53:18 -0600
One of my TSM servers has drives continously going offline over the past few 
days.  I have 2 servers attached to the same library, server 1 is fine, server 
2 keeps getting drive failures.  On Sunday, all 4 drives went down within hours 
of each other!  This strikes me as suspecious, I see this sort of message in 
the system logs:

Feb 10 23:55:39 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message 
52,lLibrary ids02atl1 is going offline
Feb 10 23:55:40 tsm2 last message repeated 1 time
Feb 11 00:17:42 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is 
online to host
Feb 11 00:38:31 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message 
52,lLibrary ids02atl1 is going offline
Feb 11 00:40:21 tsm2 last message repeated 1 time
Feb 11 00:55:33 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is 
online to host
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130) 
03590E1A        S/N 0000000E6955 SENSE DATA:
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130)  71  
0  6  0  0  0  0 58  0  0  0  0 29  0 FF  2
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130)  C4 
42  0 15  0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(130)   0  
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:43 tsm2 last message repeated 3 times
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262) 
03590E1A        S/N 0000000E6952 SENSE DATA:
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262)  71  
0  6  0  0  0  0 58  0  0  0  0 29  0 FF  2
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262)  C4 
42  0 33  0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(262)   0  
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:00:44 tsm2 last message repeated 3 times
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292) 
03590E1A        S/N 0000000E7068 SENSE DATA:
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292)  71  
0  6  0  0  0  0 58  0  0  0  0 29  0 FF  2
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292)  C4 
42  0 24  0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info]          IBMtape(292)   0  
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
Feb 11 01:19:17 tsm2 last message repeated 3 times
Feb 11 01:20:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE:  IBMtape(262) 
_write:    ec82 < Logical EOT notification, rc 0
Feb 11 01:29:29 tsm2 IBMtape: [ID 243001 kern.info] NOTICE:  IBMtape(292) 
_write:   2a091 < Logical EOT notification, rc 0
Feb 11 04:29:32 tsm2 lmcpd[1213]: [ID 410567 daemon.error] ERROR on ids02atl1, 
volume 2C0389, ERA 83 Library Drive Exception
Feb 11 04:36:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE:  IBMtape(292) 
_write:   dfddd < Logical EOT notification, rc 0


This is happening with multiple tapes, not the same few.  Does this sound like 
a hardware problem or a software/driver issue?  I can't find anything googling 
around for the errors.  The error on the library is that a drive failed with an 
unload error, the tape is stuck down in the drive.

TSM 5.1.8.1
Solaris 8
IBMtape driver 4.0.8.0 (latest I am pretty sure)
lmcpd 5.3.9.0 (latest)
Drives are SCSI attached to the server

Michael French

<Prev in Thread] Current Thread [Next in Thread>