1. Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING) Click the link to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This message will disappear after you have made at least 12 posts. Thank you for your cooperation.

Unreadable tapes hang reading process

Discussion in 'TSM Server' started by tsm-admin, May 15, 2008.

  1. tsm-admin

    tsm-admin New Member

    Joined:
    Apr 14, 2005
    Messages:
    9
    Likes Received:
    0
    Location:
    Montreal
    We changed our TSM Server from 5.2.3.2 to 5.5 on a new physical server and library.

    It was RedHat 3 and now RedHat 4 with the last kernel and the last LSI driver for our
    SCSI cards.

    Previously we had a StorageTek L80 library with LTO2 drives. Now we have a Sun STK
    SL500 with LTO3 drives.

    When TSM detected an error on the old version and hardware, it skipped the files quickly,
    noted it in his database and continued.

    Now, whenever a tape have reading problem it takes an eternity to skip the file.

    If we cancel the process when it is processing an unreadable part of the tape, the process
    keep on running for almost 30 minutes sometimes.

    Up until now, these are LTO2 in LTO3 drives...

    Someone has/had the same problem?

    Thank you for your help.
     
  2.  
  3. PJ

    PJ Senior Member

    Joined:
    Nov 18, 2005
    Messages:
    1,071
    Likes Received:
    4
    Location:
    LU Germany
    Sounds like a hardware problem. I remember way back in 3590 times we once got a bad microcode on our drives that not only produced read errors in the first place (which was unheard of before - and fortunately after they fixed it as well) but also tried for like an hour or so to reread the failed blocks over and over again. The IBM tape guy explained to me that once the IO was started and kept alive by the tape controller, there was nothing the OS, driver or TSM could do about it until the microcode finally let go and reported the error back down the wire.

    PJ
     
  4. tsm-admin

    tsm-admin New Member

    Joined:
    Apr 14, 2005
    Messages:
    9
    Likes Received:
    0
    Location:
    Montreal
    PJ:

    You should be right because I was doing a "audit vol" on a tape with unreadable
    files and /var/log/messages gave me:

    May 15 17:12:42 sp-iode kernel: mptscsih: ioc0: attempting task abort! (sc=f7cbeb00)
    May 15 17:12:42 sp-iode kernel: scsi0 : destination target 1, lun 0
    May 15 17:12:42 sp-iode kernel: command = Read (6) 00 04 00 00 00
    May 15 17:12:53 sp-iode kernel: mptbase: Initiating ioc0 recovery
    May 15 17:12:53 sp-iode kernel: mptscsih: ioc0: task abort: SUCCESS (sc=f7cbeb00)

    and the TSM drive became offline:

    tsm: SERVER1>q dr

    Library Name Drive Name Device Type On-Line
    ------------ ------------ ----------- -------------------
    SL500 DRIVE0 LTO Unavailable Since
    05/15/2008 17:12:53
    SL500 DRIVE1 LTO Yes
    SL500 DRIVE2 LTO Yes
     
  5. tsm-admin

    tsm-admin New Member

    Joined:
    Apr 14, 2005
    Messages:
    9
    Likes Received:
    0
    Location:
    Montreal
    We found that on the previous server, RH3, there were a scsi_mod option in the module
    parameter file named "scsi_allow_ghost_devices" with a description that say:

    scsi_allow_ghost_devices int,
    description "allow devices marked as being offline to be accessed anyway
    (0 = off, else allow ghosts on lun 0 through scsi_allow_ghost_devices - 1"

    Seams like the answer... If there are an I/O error, go ahead!

    It disappeared on RH4!

    modinfo gives parameters:

    scsi_logging_level:a bit mask of logging levels
    max_luns:last scsi LUN (should be between 1 and 2^32-1)
    max_report_luns:REPORT LUNS maximum number of LUNS received (should be between
    1 and 16384)
    inq_timeout:Timeout (in seconds) waiting for devices to answer INQUIRY. Default is 5.
    Some non-compliant devices need more.
    dev_flags:Given scsi_dev_flags=vendor:model:flags[,v:m:f] add black/white list
    entries for vendor and model with an integer value of flags to the scsi
    device info list
    default_dev_flags:scsi default device flag integer value
     

Share This Page