kolstet
ADSM.ORG Member
Please forgive me if this has been posted before. I'm fairly new to TSM and need some help.
"I'm sorry this letter is so long, but I did not have time to make it shorter." --Mark Twain
Here's our setup:
<UL>
<LI>Single TSM server
<LI>TSM 5.2 on IBM 346 and Red Hat AS 3.0
<LI>Dual SCSI (Adaptec AHA-3960D / AIC-7899A U160/m) card
<LI>SCSI-attached ADIC Scalar 100 Library with 3 LTO2 drives
<LI>Two drives on one SCSI chain, changer and other drive on other SCSI chain
[/list]
Over the past couple of months I've been getting a lot of volumes marked as unavailable. The tape will mount and then receive an ANR8355E (I/O error reading label for volume XXXX). Going back through the activity log, it seems that the previous use of the tape was successful, but the label keeps either being corrupted or misread.
I haven't matched this up with a specific drive, SCSI chain, or set of tapes. Every time this problem occurs, I recall the corresponding COPYPOOL tapes and restore the volume, eject the tape and delete it, then relabel the tape and check it back in as scratch, but the error keeps coming up on different tapes. Attempts to audit the volume or remark the tape as READWRITE fail because the volume label can't be read.
I'm assuming that something is corrupting my volume labels. The tape paths are set to /dev/IBMtapeX (raw device) and not /dev/IBMtapeXn (no-rewind device). Is it possible that the tape is being rewound when it shouldn't be and overwriting the volume label?
I don't want to call vendors in on this until I can narrow the problem down to a particular subsystem. If I call ADIC in to do a diagnostic, for example, they may charge me for the visit if they find out the problem is elsewhere.
The worst part about this is having to recall all my offsite tapes for these restores, leaving my eggs all in one basket during that time period until the next courier visit. On top of this, reclamation is unable to run on unavailable volumes, so I'm eating up tapes and running out of scratch very quickly. Finally, there have been a few tapes that I've been unable to completely restore. Am I correct in assuming that if I delete the data from those tapes, the newest versions of those objects will be stored on the next backup run, and I only lose my retention time?
I'm going to manually shut down TSM and try to access the tape drive directly with one of these tapes loaded, so I can get a low-level look at the first few KB of the tape and see if a volume label exists. I don't know in what format these volume labels are stored, so I'll have to compare with a known-good volume label; however any help you have would be GREATLY appreciated. Especially welcome is a method of repairing just the volume label without overwriting the rest so I can audit the tape and see if my data's still there...
"I'm sorry this letter is so long, but I did not have time to make it shorter." --Mark Twain
Here's our setup:
<UL>
<LI>Single TSM server
<LI>TSM 5.2 on IBM 346 and Red Hat AS 3.0
<LI>Dual SCSI (Adaptec AHA-3960D / AIC-7899A U160/m) card
<LI>SCSI-attached ADIC Scalar 100 Library with 3 LTO2 drives
<LI>Two drives on one SCSI chain, changer and other drive on other SCSI chain
[/list]
Over the past couple of months I've been getting a lot of volumes marked as unavailable. The tape will mount and then receive an ANR8355E (I/O error reading label for volume XXXX). Going back through the activity log, it seems that the previous use of the tape was successful, but the label keeps either being corrupted or misread.
I haven't matched this up with a specific drive, SCSI chain, or set of tapes. Every time this problem occurs, I recall the corresponding COPYPOOL tapes and restore the volume, eject the tape and delete it, then relabel the tape and check it back in as scratch, but the error keeps coming up on different tapes. Attempts to audit the volume or remark the tape as READWRITE fail because the volume label can't be read.
I'm assuming that something is corrupting my volume labels. The tape paths are set to /dev/IBMtapeX (raw device) and not /dev/IBMtapeXn (no-rewind device). Is it possible that the tape is being rewound when it shouldn't be and overwriting the volume label?
I don't want to call vendors in on this until I can narrow the problem down to a particular subsystem. If I call ADIC in to do a diagnostic, for example, they may charge me for the visit if they find out the problem is elsewhere.
The worst part about this is having to recall all my offsite tapes for these restores, leaving my eggs all in one basket during that time period until the next courier visit. On top of this, reclamation is unable to run on unavailable volumes, so I'm eating up tapes and running out of scratch very quickly. Finally, there have been a few tapes that I've been unable to completely restore. Am I correct in assuming that if I delete the data from those tapes, the newest versions of those objects will be stored on the next backup run, and I only lose my retention time?
I'm going to manually shut down TSM and try to access the tape drive directly with one of these tapes loaded, so I can get a low-level look at the first few KB of the tape and see if a volume label exists. I don't know in what format these volume labels are stored, so I'll have to compare with a known-good volume label; however any help you have would be GREATLY appreciated. Especially welcome is a method of repairing just the volume label without overwriting the rest so I can audit the tape and see if my data's still there...