One of my TSM servers has drives continously going offline over the past few
days. I have 2 servers attached to the same library, server 1 is fine, server
2 keeps getting drive failures. On Sunday, all 4 drives went down within hours
of each other! This strikes me as suspecious, I see this sort of message in
the system logs:
Feb 10 23:55:39 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message
52,lLibrary ids02atl1 is going offline
Feb 10 23:55:40 tsm2 last message repeated 1 time
Feb 11 00:17:42 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is
online to host
Feb 11 00:38:31 tsm2 lmcpd[1213]: [ID 470916 daemon.error] Received message
52,lLibrary ids02atl1 is going offline
Feb 11 00:40:21 tsm2 last message repeated 1 time
Feb 11 00:55:33 tsm2 lmcpd[1213]: [ID 257369 daemon.error] Library ids02atl1 is
online to host
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130)
03590E1A S/N 0000000E6955 SENSE DATA:
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) 71
0 6 0 0 0 0 58 0 0 0 0 29 0 FF 2
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) C4
42 0 15 0 0 0 0 0 0 0 0 0 0 0 0
Feb 11 01:00:43 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(130) 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Feb 11 01:00:43 tsm2 last message repeated 3 times
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262)
03590E1A S/N 0000000E6952 SENSE DATA:
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) 71
0 6 0 0 0 0 58 0 0 0 0 29 0 FF 2
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) C4
42 0 33 0 0 0 0 0 0 0 0 0 0 0 0
Feb 11 01:00:44 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(262) 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Feb 11 01:00:44 tsm2 last message repeated 3 times
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292)
03590E1A S/N 0000000E7068 SENSE DATA:
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) 71
0 6 0 0 0 0 58 0 0 0 0 29 0 FF 2
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) C4
42 0 24 0 0 0 0 0 0 0 0 0 0 0 0
Feb 11 01:19:17 tsm2 IBMtape: [ID 243001 kern.info] IBMtape(292) 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Feb 11 01:19:17 tsm2 last message repeated 3 times
Feb 11 01:20:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE: IBMtape(262)
_write: ec82 < Logical EOT notification, rc 0
Feb 11 01:29:29 tsm2 IBMtape: [ID 243001 kern.info] NOTICE: IBMtape(292)
_write: 2a091 < Logical EOT notification, rc 0
Feb 11 04:29:32 tsm2 lmcpd[1213]: [ID 410567 daemon.error] ERROR on ids02atl1,
volume 2C0389, ERA 83 Library Drive Exception
Feb 11 04:36:17 tsm2 IBMtape: [ID 243001 kern.info] NOTICE: IBMtape(292)
_write: dfddd < Logical EOT notification, rc 0
This is happening with multiple tapes, not the same few. Does this sound like
a hardware problem or a software/driver issue? I can't find anything googling
around for the errors. The error on the library is that a drive failed with an
unload error, the tape is stuck down in the drive.
TSM 5.1.8.1
Solaris 8
IBMtape driver 4.0.8.0 (latest I am pretty sure)
lmcpd 5.3.9.0 (latest)
Drives are SCSI attached to the server
Michael French
|