Networker

Re: [Networker] SDLT320 unreadable tapes

2004-01-22 05:33:39
Subject: Re: [Networker] SDLT320 unreadable tapes
From: John Herlihy <johnh AT XSIDATA.COM DOT AU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 22 Jan 2004 21:30:38 +1100
Sorry - the environment has a mix of Solaris 8, Tru64 v5.1A/B and Windows 2000 
boxes all sharing various drives in the same library across a fibre channel 
(Brocade) SAN. All drives are connected to ADIC SNC5000 scsi-fibre bridges, 
which are then connected to the SAN. The Server is a Tru64 v5.1B system.
 
I don't think resets are the problem as I'd be seeing errors in the system logs 
if that were the case. Legato seem to think that some other application is 
overwriting the head of the tape, but if that were the case then I don't think 
I'd be able to pull a partial label from the header of the tape.
 
Cheers,
John

        -----Original Message----- 
        From: Mark Bradshaw (BTOpenWorld) [mailto:notthehoople AT BTOPENWORLD 
DOT COM] 
        Sent: Wed 21/01/2004 7:32 AM 
        To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU 
        Cc: 
        Subject: Re: [Networker] SDLT320 unreadable tapes
        
        

        Hi John,
        
        I'm a bit confused here. Your scanner shows you are using /dev/rmt/0cbn 
as a
        remote device on a Storage Node but you don't mention this in your
        environment. Is /dev/rmt/0cbn the problem device and if so are you 
running
        scanner on a Solaris Storage Node?
        
        Ah - looking a bit closer the prompt you are running scanner on is
        'osun5680' so I guess it is a Solaris box. Can you flesh out your
        environment for us please?
        
        Also could it be that you are sharing SDLT drives between Solaris and 
Tru64
        in a SAN and are maybe suffering from SCSI resets?
        
        Some random thoughts!
        
        Cheers
        
        Mark
        
        > nah - I'm using /dev/ntape/tape#_d1 which is the non-rewind device. 
If that
        > was it then all tapes would be affected.
        >
        > Also - a weird thing is that the sus tapes will mount fine and accept 
backups
        > fine, but as soon as you remove them from the media db (or try to do a
        > restore), then you're unable to read the header information on the 
tape.
        >
        > -----Original Message-----
        > From: Davina Treiber [mailto:Treiber AT hotpop DOT com]
        > Sent: Mon 19/01/2004 6:53 PM
        > To: Legato NetWorker discussion; John Herlihy
        > Cc:
        > Subject: Re: [Networker] SDLT320 unreadable tapes
        >
        >
        >
        > I haven't worked with Tru64 for a while so the device naming 
conventions
        > aren't fresh in my mind, but is it possible that in some way you are
        > using rewind devices? That would account for the abnormally high 
amounts
        > of data written to some volumes, and could also account for the
        > corruption you are seeing. Of course if this is the case it's bad news
        > in terms of recovering your data. Just a thought...
        >
        > John Herlihy wrote:
        >> Hi,
        >>
        >> sorry for the length of this email, but figured I'd chuck all the 
info
        >> in there now - I am seeing an issue where it looks like the Networker
        >> headers on the tape are incomplete.
        >>
        >> This is the environment:
        >> Tru64 v5.1A
        >> Networker Power Edition v6.1.3
        >> SDLT320 drives
        >>
        >> When trying to use "scanner -i <device>" to scan a tape back in it:
        >> 1 - prompts for you to enter in the name of the volume.
        >> 2 - complains that there is no pool named `'
        >> 3 - fails in a short amount of time (ie about 5-10 secs)
        >>
        >> Here is the scanner output:
        >> =================================================
        >> osun5680[/]# scanner -s nsr01 -vim /dev/rmt/0cbn
        >> scanner: using 'rd=server1:/dev/rmt/0cbn' as the device name
        >> scanner: Opened /dev/rmt/0cbn for read
        >> scanner: Rewinding...
        >> scanner: Rewinding done
        >> scanner: Reading the label...
        >> scanner: Reading the label done
        >> scanner: SYSTEM error: Tape label read: Bad file number
        >> scanner: SYSTEM error: Tape label read: Bad file number
        >> scanner: scanning for valid records...
        >> scanner: read: 131072 bytes
        >> scanner: read: 131072 bytes
        >> scanner: Found valid record:
        >> scanner: volume id 2434907393
        >> scanner: file number 110
        >> scanner: record number 5930
        >> scanner: Enter the volume's name: SU0026
        >> scanner: volume name `SU0026'
        >> scanner: scanning sdlt320 tape SU0026 on rd=server1:/dev/rmt/0cbn
        >> scanner: volume id 2434907393 record size 131072
        >> created 1/01/70 10:00:00 expires 1/01/70 10:00:00
        >> scanner: adding sdlt320 tape SU0026 to pool
        >> scanner: RAP error: There is no pool named `'.
        >> scanner: create pool manually after scanner; continuing...
        >> scanner: Rewinding...
        >> scanner: Rewinding done
        >> scanner: setting position from fn 0, rn 0 to fn 2, rn 0
        >> scanner: Opened /dev/rmt/0cbn for read
        >> scanner: unexpected file number, wanted 2 got 112
        >> scanner: adjusting file number from 2 to 112
        >> scanner: scanning file 112, record 0
        >> scanner: unexpected volume id, wanted 2434907393 got 2434907393
        >> scanner: Opened /dev/rmt/0cbn for read
        >> scanner: done with sdlt320 tape SU0026
        >> scanner: Rewinding...
        >> scanner: Rewinding done
        >> =================================================
        >>
        >> We were able to obtain the header from the tape via the command:
        >> dd if=/dev/rmt/0cbn of=/tmp/tapeheader bs=128k count=1
        >>
        >> ..and then view it with the command:
        >> strings /tmp/tapeheader
        >>
        >> Here is the output from 2 problem tapes:
        >> =================================================
        >> For volume SU0026:
        >> VOL1SU0026NETWORKER                                           3
        >> setting position from fn %lu, rn %lu to fn %lu,
        >>
        >> For volume SU0116:
        >> VOL1SU0116NETWORKER                                           3
        >> setting position from fn %lu, rn %lu to fn %lu,
        >>
        >> =================================================
        >>
        >> This is what the header of a good tape looks like:
        >> =================================================
        >> VOL1SU0295NETWORKER                                           3
        >> setting position from fn %lu, rn %lu to fn %lu,
        >> C%D2
        >> SU0295
        >> volume pool
        >> SCRATCH
        >> =================================================
        >>
        >> ALSO - I'm also seeing that an abnormal amount of data is being 
written
        >> to these tapes via the "mminfo -m" output. I don't know about SU0026 
as
        >> it's already been deleted from the media db, but SU0116 has 1202GB on
        >> it!! I've looked through the mminfo output and found other tapes 
which
        >> have between 500GB-1848GB!!!
        >>
        >> I've checked four of these tapes which contained 1848GB, 921GB, 
671GB &
        >> 1700GB respectively, and only the 671GB tape was able to be read.
        >>
        >> I used "tcopy" to get a listing of the tapes structures, and the one
        >> that worked had 2 x 32KB header files before changing to 128KB data
        >> blocks while the other 3 only had 128KB blocks.
        >>
        >> I'm investigating driver versions at the moment, but can anyone 
think of
        >> what could be causing this? There doesn't appear to be any common
        >> trigger (Windows & Unix systems are affected across multiple 
drives...
        >> firmware has been upgraded, etc).
        >>
        >
        >
        >
        
        --
        Note: To sign off this list, send a "signoff networker" command via 
email
        to listserv AT listmail.temple DOT edu or visit the list's Web site at
        http://listmail.temple.edu/archives/networker.html where you can
        also view and post messages to the list.
        =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=