Sorry - the environment has a mix of Solaris 8, Tru64 v5.1A/B and Windows 2000
boxes all sharing various drives in the same library across a fibre channel
(Brocade) SAN. All drives are connected to ADIC SNC5000 scsi-fibre bridges,
which are then connected to the SAN. The Server is a Tru64 v5.1B system.
I don't think resets are the problem as I'd be seeing errors in the system logs
if that were the case. Legato seem to think that some other application is
overwriting the head of the tape, but if that were the case then I don't think
I'd be able to pull a partial label from the header of the tape.
Cheers,
John
-----Original Message-----
From: Mark Bradshaw (BTOpenWorld) [mailto:notthehoople AT BTOPENWORLD
DOT COM]
Sent: Wed 21/01/2004 7:32 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Cc:
Subject: Re: [Networker] SDLT320 unreadable tapes
Hi John,
I'm a bit confused here. Your scanner shows you are using /dev/rmt/0cbn
as a
remote device on a Storage Node but you don't mention this in your
environment. Is /dev/rmt/0cbn the problem device and if so are you
running
scanner on a Solaris Storage Node?
Ah - looking a bit closer the prompt you are running scanner on is
'osun5680' so I guess it is a Solaris box. Can you flesh out your
environment for us please?
Also could it be that you are sharing SDLT drives between Solaris and
Tru64
in a SAN and are maybe suffering from SCSI resets?
Some random thoughts!
Cheers
Mark
> nah - I'm using /dev/ntape/tape#_d1 which is the non-rewind device.
If that
> was it then all tapes would be affected.
>
> Also - a weird thing is that the sus tapes will mount fine and accept
backups
> fine, but as soon as you remove them from the media db (or try to do a
> restore), then you're unable to read the header information on the
tape.
>
> -----Original Message-----
> From: Davina Treiber [mailto:Treiber AT hotpop DOT com]
> Sent: Mon 19/01/2004 6:53 PM
> To: Legato NetWorker discussion; John Herlihy
> Cc:
> Subject: Re: [Networker] SDLT320 unreadable tapes
>
>
>
> I haven't worked with Tru64 for a while so the device naming
conventions
> aren't fresh in my mind, but is it possible that in some way you are
> using rewind devices? That would account for the abnormally high
amounts
> of data written to some volumes, and could also account for the
> corruption you are seeing. Of course if this is the case it's bad news
> in terms of recovering your data. Just a thought...
>
> John Herlihy wrote:
>> Hi,
>>
>> sorry for the length of this email, but figured I'd chuck all the
info
>> in there now - I am seeing an issue where it looks like the Networker
>> headers on the tape are incomplete.
>>
>> This is the environment:
>> Tru64 v5.1A
>> Networker Power Edition v6.1.3
>> SDLT320 drives
>>
>> When trying to use "scanner -i <device>" to scan a tape back in it:
>> 1 - prompts for you to enter in the name of the volume.
>> 2 - complains that there is no pool named `'
>> 3 - fails in a short amount of time (ie about 5-10 secs)
>>
>> Here is the scanner output:
>> =================================================
>> osun5680[/]# scanner -s nsr01 -vim /dev/rmt/0cbn
>> scanner: using 'rd=server1:/dev/rmt/0cbn' as the device name
>> scanner: Opened /dev/rmt/0cbn for read
>> scanner: Rewinding...
>> scanner: Rewinding done
>> scanner: Reading the label...
>> scanner: Reading the label done
>> scanner: SYSTEM error: Tape label read: Bad file number
>> scanner: SYSTEM error: Tape label read: Bad file number
>> scanner: scanning for valid records...
>> scanner: read: 131072 bytes
>> scanner: read: 131072 bytes
>> scanner: Found valid record:
>> scanner: volume id 2434907393
>> scanner: file number 110
>> scanner: record number 5930
>> scanner: Enter the volume's name: SU0026
>> scanner: volume name `SU0026'
>> scanner: scanning sdlt320 tape SU0026 on rd=server1:/dev/rmt/0cbn
>> scanner: volume id 2434907393 record size 131072
>> created 1/01/70 10:00:00 expires 1/01/70 10:00:00
>> scanner: adding sdlt320 tape SU0026 to pool
>> scanner: RAP error: There is no pool named `'.
>> scanner: create pool manually after scanner; continuing...
>> scanner: Rewinding...
>> scanner: Rewinding done
>> scanner: setting position from fn 0, rn 0 to fn 2, rn 0
>> scanner: Opened /dev/rmt/0cbn for read
>> scanner: unexpected file number, wanted 2 got 112
>> scanner: adjusting file number from 2 to 112
>> scanner: scanning file 112, record 0
>> scanner: unexpected volume id, wanted 2434907393 got 2434907393
>> scanner: Opened /dev/rmt/0cbn for read
>> scanner: done with sdlt320 tape SU0026
>> scanner: Rewinding...
>> scanner: Rewinding done
>> =================================================
>>
>> We were able to obtain the header from the tape via the command:
>> dd if=/dev/rmt/0cbn of=/tmp/tapeheader bs=128k count=1
>>
>> ..and then view it with the command:
>> strings /tmp/tapeheader
>>
>> Here is the output from 2 problem tapes:
>> =================================================
>> For volume SU0026:
>> VOL1SU0026NETWORKER 3
>> setting position from fn %lu, rn %lu to fn %lu,
>>
>> For volume SU0116:
>> VOL1SU0116NETWORKER 3
>> setting position from fn %lu, rn %lu to fn %lu,
>>
>> =================================================
>>
>> This is what the header of a good tape looks like:
>> =================================================
>> VOL1SU0295NETWORKER 3
>> setting position from fn %lu, rn %lu to fn %lu,
>> C%D2
>> SU0295
>> volume pool
>> SCRATCH
>> =================================================
>>
>> ALSO - I'm also seeing that an abnormal amount of data is being
written
>> to these tapes via the "mminfo -m" output. I don't know about SU0026
as
>> it's already been deleted from the media db, but SU0116 has 1202GB on
>> it!! I've looked through the mminfo output and found other tapes
which
>> have between 500GB-1848GB!!!
>>
>> I've checked four of these tapes which contained 1848GB, 921GB,
671GB &
>> 1700GB respectively, and only the 671GB tape was able to be read.
>>
>> I used "tcopy" to get a listing of the tapes structures, and the one
>> that worked had 2 x 32KB header files before changing to 128KB data
>> blocks while the other 3 only had 128KB blocks.
>>
>> I'm investigating driver versions at the moment, but can anyone
think of
>> what could be causing this? There doesn't appear to be any common
>> trigger (Windows & Unix systems are affected across multiple
drives...
>> firmware has been upgraded, etc).
>>
>
>
>
--
Note: To sign off this list, send a "signoff networker" command via
email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|