Networker

Re: [Networker] SDLT320 unreadable tapes

2004-01-26 18:35:49
Subject: Re: [Networker] SDLT320 unreadable tapes
From: Shaun Ellis <sellis AT LEGATO DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 26 Jan 2004 15:34:29 -0800
Is this a SAN? If so, then you should disable SCSI Bus Reset on FC/SCSI
controller.

Shaun Ellis
LEGATO Software
3210, Porter Drive
Palo Alto
Ca 94304
Phone: +1 (650) 842 9548
Mobile: +1 (408) 431 6997


> -----Original Message-----
> From: Teresa Biehler [mailto:tpbsys AT rit DOT edu]
> Sent: Thursday, January 22, 2004 9:04 AM
> To: Legato NetWorker discussion; John Herlihy
> Subject: RE: [Networker] SDLT320 unreadable tapes
>
> Based on what you are seeing in the scanner output, it looks like the
> tape is getting rewound while a backup is taking place.  Networker is
> able to determine the label from "control" information (for lack of a
> better description) that is written at various points on the tape.
>
> Is the Removable Storage Service running on your Windows servers?  If it
> is, it will try to take control of the drives and, if it finds them in
> an "unexpected state", it will rewind them.  In this case, "unexpected
> state" may be when another host is writing to the tape.
>
> We just had a similar problem in our environment.  We proved it was the
> Removable Storage service by rebooting the Windows server while the
> library was in a safe state (tapes loaded but Networker down).  When the
> Windows server came back up, we saw activity on the tape drive (the
> blinking light indicating that the tape was moving).  We then disabled
> this service and did another reboot.  This time when the server booted,
> there was no tape activity.
>
> Good luck.
> Teresa
>
>
> -----Original Message-----
> From: Legato NetWorker discussion [mailto:NETWORKER AT LISTMAIL.TEMPLE DOT 
> EDU]
> On Behalf Of John Herlihy
> Sent: Thursday, January 22, 2004 5:31 AM
> To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
> Subject: Re: [Networker] SDLT320 unreadable tapes
>
> Sorry - the environment has a mix of Solaris 8, Tru64 v5.1A/B and
> Windows 2000 boxes all sharing various drives in the same library across
> a fibre channel (Brocade) SAN. All drives are connected to ADIC SNC5000
> scsi-fibre bridges, which are then connected to the SAN. The Server is a
> Tru64 v5.1B system.
>
> I don't think resets are the problem as I'd be seeing errors in the
> system logs if that were the case. Legato seem to think that some other
> application is overwriting the head of the tape, but if that were the
> case then I don't think I'd be able to pull a partial label from the
> header of the tape.
>
> Cheers,
> John
>
>         -----Original Message-----
>         From: Mark Bradshaw (BTOpenWorld)
> [mailto:notthehoople AT BTOPENWORLD DOT COM]
>         Sent: Wed 21/01/2004 7:32 AM
>         To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
>         Cc:
>         Subject: Re: [Networker] SDLT320 unreadable tapes
>
>
>
>         Hi John,
>
>         I'm a bit confused here. Your scanner shows you are using
> /dev/rmt/0cbn as a
>         remote device on a Storage Node but you don't mention this in
> your
>         environment. Is /dev/rmt/0cbn the problem device and if so are
> you running
>         scanner on a Solaris Storage Node?
>
>         Ah - looking a bit closer the prompt you are running scanner on
> is
>         'osun5680' so I guess it is a Solaris box. Can you flesh out
> your
>         environment for us please?
>
>         Also could it be that you are sharing SDLT drives between
> Solaris and Tru64
>         in a SAN and are maybe suffering from SCSI resets?
>
>         Some random thoughts!
>
>         Cheers
>
>         Mark
>
>         > nah - I'm using /dev/ntape/tape#_d1 which is the non-rewind
> device. If that
>         > was it then all tapes would be affected.
>         >
>         > Also - a weird thing is that the sus tapes will mount fine and
> accept backups
>         > fine, but as soon as you remove them from the media db (or try
> to do a
>         > restore), then you're unable to read the header information on
> the tape.
>         >
>         > -----Original Message-----
>         > From: Davina Treiber [mailto:Treiber AT hotpop DOT com]
>         > Sent: Mon 19/01/2004 6:53 PM
>         > To: Legato NetWorker discussion; John Herlihy
>         > Cc:
>         > Subject: Re: [Networker] SDLT320 unreadable tapes
>         >
>         >
>         >
>         > I haven't worked with Tru64 for a while so the device naming
> conventions
>         > aren't fresh in my mind, but is it possible that in some way
> you are
>         > using rewind devices? That would account for the abnormally
> high amounts
>         > of data written to some volumes, and could also account for
> the
>         > corruption you are seeing. Of course if this is the case it's
> bad news
>         > in terms of recovering your data. Just a thought...
>         >
>         > John Herlihy wrote:
>         >> Hi,
>         >>
>         >> sorry for the length of this email, but figured I'd chuck all
> the info
>         >> in there now - I am seeing an issue where it looks like the
> Networker
>         >> headers on the tape are incomplete.
>         >>
>         >> This is the environment:
>         >> Tru64 v5.1A
>         >> Networker Power Edition v6.1.3
>         >> SDLT320 drives
>         >>
>         >> When trying to use "scanner -i <device>" to scan a tape back
> in it:
>         >> 1 - prompts for you to enter in the name of the volume.
>         >> 2 - complains that there is no pool named `'
>         >> 3 - fails in a short amount of time (ie about 5-10 secs)
>         >>
>         >> Here is the scanner output:
>         >> =================================================
>         >> osun5680[/]# scanner -s nsr01 -vim /dev/rmt/0cbn
>         >> scanner: using 'rd=server1:/dev/rmt/0cbn' as the device name
>         >> scanner: Opened /dev/rmt/0cbn for read
>         >> scanner: Rewinding...
>         >> scanner: Rewinding done
>         >> scanner: Reading the label...
>         >> scanner: Reading the label done
>         >> scanner: SYSTEM error: Tape label read: Bad file number
>         >> scanner: SYSTEM error: Tape label read: Bad file number
>         >> scanner: scanning for valid records...
>         >> scanner: read: 131072 bytes
>         >> scanner: read: 131072 bytes
>         >> scanner: Found valid record:
>         >> scanner: volume id 2434907393
>         >> scanner: file number 110
>         >> scanner: record number 5930
>         >> scanner: Enter the volume's name: SU0026
>         >> scanner: volume name `SU0026'
>         >> scanner: scanning sdlt320 tape SU0026 on
> rd=server1:/dev/rmt/0cbn
>         >> scanner: volume id 2434907393 record size 131072
>         >> created 1/01/70 10:00:00 expires 1/01/70 10:00:00
>         >> scanner: adding sdlt320 tape SU0026 to pool
>         >> scanner: RAP error: There is no pool named `'.
>         >> scanner: create pool manually after scanner; continuing...
>         >> scanner: Rewinding...
>         >> scanner: Rewinding done
>         >> scanner: setting position from fn 0, rn 0 to fn 2, rn 0
>         >> scanner: Opened /dev/rmt/0cbn for read
>         >> scanner: unexpected file number, wanted 2 got 112
>         >> scanner: adjusting file number from 2 to 112
>         >> scanner: scanning file 112, record 0
>         >> scanner: unexpected volume id, wanted 2434907393 got
> 2434907393
>         >> scanner: Opened /dev/rmt/0cbn for read
>         >> scanner: done with sdlt320 tape SU0026
>         >> scanner: Rewinding...
>         >> scanner: Rewinding done
>         >> =================================================
>         >>
>         >> We were able to obtain the header from the tape via the
> command:
>         >> dd if=/dev/rmt/0cbn of=/tmp/tapeheader bs=128k count=1
>         >>
>         >> ..and then view it with the command:
>         >> strings /tmp/tapeheader
>         >>
>         >> Here is the output from 2 problem tapes:
>         >> =================================================
>         >> For volume SU0026:
>         >> VOL1SU0026NETWORKER
> 3
>         >> setting position from fn %lu, rn %lu to fn %lu,
>         >>
>         >> For volume SU0116:
>         >> VOL1SU0116NETWORKER
> 3
>         >> setting position from fn %lu, rn %lu to fn %lu,
>         >>
>         >> =================================================
>         >>
>         >> This is what the header of a good tape looks like:
>         >> =================================================
>         >> VOL1SU0295NETWORKER
> 3
>         >> setting position from fn %lu, rn %lu to fn %lu,
>         >> C%D2
>         >> SU0295
>         >> volume pool
>         >> SCRATCH
>         >> =================================================
>         >>
>         >> ALSO - I'm also seeing that an abnormal amount of data is
> being written
>         >> to these tapes via the "mminfo -m" output. I don't know about
> SU0026 as
>         >> it's already been deleted from the media db, but SU0116 has
> 1202GB on
>         >> it!! I've looked through the mminfo output and found other
> tapes which
>         >> have between 500GB-1848GB!!!
>         >>
>         >> I've checked four of these tapes which contained 1848GB,
> 921GB, 671GB &
>         >> 1700GB respectively, and only the 671GB tape was able to be
> read.
>         >>
>         >> I used "tcopy" to get a listing of the tapes structures, and
> the one
>         >> that worked had 2 x 32KB header files before changing to
> 128KB data
>         >> blocks while the other 3 only had 128KB blocks.
>         >>
>         >> I'm investigating driver versions at the moment, but can
> anyone think of
>         >> what could be causing this? There doesn't appear to be any
> common
>         >> trigger (Windows & Unix systems are affected across multiple
> drives...
>         >> firmware has been upgraded, etc).
>         >>
>         >
>         >
>         >
>
>         --
>         Note: To sign off this list, send a "signoff networker" command
> via email
>         to listserv AT listmail.temple DOT edu or visit the list's Web site at
>         http://listmail.temple.edu/archives/networker.html where you can
>         also view and post messages to the list.
>         =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>