Networker

Re: [Networker] SDLT320 unreadable tapes

2004-01-22 12:27:51
Subject: Re: [Networker] SDLT320 unreadable tapes
From: Teresa Biehler <tpbsys AT RIT DOT EDU>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Thu, 22 Jan 2004 12:03:46 -0500
Based on what you are seeing in the scanner output, it looks like the
tape is getting rewound while a backup is taking place.  Networker is
able to determine the label from "control" information (for lack of a
better description) that is written at various points on the tape.  

Is the Removable Storage Service running on your Windows servers?  If it
is, it will try to take control of the drives and, if it finds them in
an "unexpected state", it will rewind them.  In this case, "unexpected
state" may be when another host is writing to the tape.

We just had a similar problem in our environment.  We proved it was the
Removable Storage service by rebooting the Windows server while the
library was in a safe state (tapes loaded but Networker down).  When the
Windows server came back up, we saw activity on the tape drive (the
blinking light indicating that the tape was moving).  We then disabled
this service and did another reboot.  This time when the server booted,
there was no tape activity.

Good luck.
Teresa


-----Original Message-----
From: Legato NetWorker discussion [mailto:NETWORKER AT LISTMAIL.TEMPLE DOT EDU]
On Behalf Of John Herlihy
Sent: Thursday, January 22, 2004 5:31 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] SDLT320 unreadable tapes

Sorry - the environment has a mix of Solaris 8, Tru64 v5.1A/B and
Windows 2000 boxes all sharing various drives in the same library across
a fibre channel (Brocade) SAN. All drives are connected to ADIC SNC5000
scsi-fibre bridges, which are then connected to the SAN. The Server is a
Tru64 v5.1B system.
 
I don't think resets are the problem as I'd be seeing errors in the
system logs if that were the case. Legato seem to think that some other
application is overwriting the head of the tape, but if that were the
case then I don't think I'd be able to pull a partial label from the
header of the tape.
 
Cheers,
John

        -----Original Message----- 
        From: Mark Bradshaw (BTOpenWorld)
[mailto:notthehoople AT BTOPENWORLD DOT COM] 
        Sent: Wed 21/01/2004 7:32 AM 
        To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU 
        Cc: 
        Subject: Re: [Networker] SDLT320 unreadable tapes
        
        

        Hi John,
        
        I'm a bit confused here. Your scanner shows you are using
/dev/rmt/0cbn as a
        remote device on a Storage Node but you don't mention this in
your
        environment. Is /dev/rmt/0cbn the problem device and if so are
you running
        scanner on a Solaris Storage Node?
        
        Ah - looking a bit closer the prompt you are running scanner on
is
        'osun5680' so I guess it is a Solaris box. Can you flesh out
your
        environment for us please?
        
        Also could it be that you are sharing SDLT drives between
Solaris and Tru64
        in a SAN and are maybe suffering from SCSI resets?
        
        Some random thoughts!
        
        Cheers
        
        Mark
        
        > nah - I'm using /dev/ntape/tape#_d1 which is the non-rewind
device. If that
        > was it then all tapes would be affected.
        >
        > Also - a weird thing is that the sus tapes will mount fine and
accept backups
        > fine, but as soon as you remove them from the media db (or try
to do a
        > restore), then you're unable to read the header information on
the tape.
        >
        > -----Original Message-----
        > From: Davina Treiber [mailto:Treiber AT hotpop DOT com]
        > Sent: Mon 19/01/2004 6:53 PM
        > To: Legato NetWorker discussion; John Herlihy
        > Cc:
        > Subject: Re: [Networker] SDLT320 unreadable tapes
        >
        >
        >
        > I haven't worked with Tru64 for a while so the device naming
conventions
        > aren't fresh in my mind, but is it possible that in some way
you are
        > using rewind devices? That would account for the abnormally
high amounts
        > of data written to some volumes, and could also account for
the
        > corruption you are seeing. Of course if this is the case it's
bad news
        > in terms of recovering your data. Just a thought...
        >
        > John Herlihy wrote:
        >> Hi,
        >>
        >> sorry for the length of this email, but figured I'd chuck all
the info
        >> in there now - I am seeing an issue where it looks like the
Networker
        >> headers on the tape are incomplete.
        >>
        >> This is the environment:
        >> Tru64 v5.1A
        >> Networker Power Edition v6.1.3
        >> SDLT320 drives
        >>
        >> When trying to use "scanner -i <device>" to scan a tape back
in it:
        >> 1 - prompts for you to enter in the name of the volume.
        >> 2 - complains that there is no pool named `'
        >> 3 - fails in a short amount of time (ie about 5-10 secs)
        >>
        >> Here is the scanner output:
        >> =================================================
        >> osun5680[/]# scanner -s nsr01 -vim /dev/rmt/0cbn
        >> scanner: using 'rd=server1:/dev/rmt/0cbn' as the device name
        >> scanner: Opened /dev/rmt/0cbn for read
        >> scanner: Rewinding...
        >> scanner: Rewinding done
        >> scanner: Reading the label...
        >> scanner: Reading the label done
        >> scanner: SYSTEM error: Tape label read: Bad file number
        >> scanner: SYSTEM error: Tape label read: Bad file number
        >> scanner: scanning for valid records...
        >> scanner: read: 131072 bytes
        >> scanner: read: 131072 bytes
        >> scanner: Found valid record:
        >> scanner: volume id 2434907393
        >> scanner: file number 110
        >> scanner: record number 5930
        >> scanner: Enter the volume's name: SU0026
        >> scanner: volume name `SU0026'
        >> scanner: scanning sdlt320 tape SU0026 on
rd=server1:/dev/rmt/0cbn
        >> scanner: volume id 2434907393 record size 131072
        >> created 1/01/70 10:00:00 expires 1/01/70 10:00:00
        >> scanner: adding sdlt320 tape SU0026 to pool
        >> scanner: RAP error: There is no pool named `'.
        >> scanner: create pool manually after scanner; continuing...
        >> scanner: Rewinding...
        >> scanner: Rewinding done
        >> scanner: setting position from fn 0, rn 0 to fn 2, rn 0
        >> scanner: Opened /dev/rmt/0cbn for read
        >> scanner: unexpected file number, wanted 2 got 112
        >> scanner: adjusting file number from 2 to 112
        >> scanner: scanning file 112, record 0
        >> scanner: unexpected volume id, wanted 2434907393 got
2434907393
        >> scanner: Opened /dev/rmt/0cbn for read
        >> scanner: done with sdlt320 tape SU0026
        >> scanner: Rewinding...
        >> scanner: Rewinding done
        >> =================================================
        >>
        >> We were able to obtain the header from the tape via the
command:
        >> dd if=/dev/rmt/0cbn of=/tmp/tapeheader bs=128k count=1
        >>
        >> ..and then view it with the command:
        >> strings /tmp/tapeheader
        >>
        >> Here is the output from 2 problem tapes:
        >> =================================================
        >> For volume SU0026:
        >> VOL1SU0026NETWORKER
3
        >> setting position from fn %lu, rn %lu to fn %lu,
        >>
        >> For volume SU0116:
        >> VOL1SU0116NETWORKER
3
        >> setting position from fn %lu, rn %lu to fn %lu,
        >>
        >> =================================================
        >>
        >> This is what the header of a good tape looks like:
        >> =================================================
        >> VOL1SU0295NETWORKER
3
        >> setting position from fn %lu, rn %lu to fn %lu,
        >> C%D2
        >> SU0295
        >> volume pool
        >> SCRATCH
        >> =================================================
        >>
        >> ALSO - I'm also seeing that an abnormal amount of data is
being written
        >> to these tapes via the "mminfo -m" output. I don't know about
SU0026 as
        >> it's already been deleted from the media db, but SU0116 has
1202GB on
        >> it!! I've looked through the mminfo output and found other
tapes which
        >> have between 500GB-1848GB!!!
        >>
        >> I've checked four of these tapes which contained 1848GB,
921GB, 671GB &
        >> 1700GB respectively, and only the 671GB tape was able to be
read.
        >>
        >> I used "tcopy" to get a listing of the tapes structures, and
the one
        >> that worked had 2 x 32KB header files before changing to
128KB data
        >> blocks while the other 3 only had 128KB blocks.
        >>
        >> I'm investigating driver versions at the moment, but can
anyone think of
        >> what could be causing this? There doesn't appear to be any
common
        >> trigger (Windows & Unix systems are affected across multiple
drives...
        >> firmware has been upgraded, etc).
        >>
        >
        >
        >
        
        --
        Note: To sign off this list, send a "signoff networker" command
via email
        to listserv AT listmail.temple DOT edu or visit the list's Web site at
        http://listmail.temple.edu/archives/networker.html where you can
        also view and post messages to the list.
        =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
        

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=