Is this a SAN? If so, then you should disable SCSI Bus Reset on FC/SCSI
controller.
Shaun Ellis
LEGATO Software
3210, Porter Drive
Palo Alto
Ca 94304
Phone: +1 (650) 842 9548
Mobile: +1 (408) 431 6997
> -----Original Message-----
> From: Teresa Biehler [mailto:tpbsys AT rit DOT edu]
> Sent: Thursday, January 22, 2004 9:04 AM
> To: Legato NetWorker discussion; John Herlihy
> Subject: RE: [Networker] SDLT320 unreadable tapes
>
> Based on what you are seeing in the scanner output, it looks like the
> tape is getting rewound while a backup is taking place. Networker is
> able to determine the label from "control" information (for lack of a
> better description) that is written at various points on the tape.
>
> Is the Removable Storage Service running on your Windows servers? If it
> is, it will try to take control of the drives and, if it finds them in
> an "unexpected state", it will rewind them. In this case, "unexpected
> state" may be when another host is writing to the tape.
>
> We just had a similar problem in our environment. We proved it was the
> Removable Storage service by rebooting the Windows server while the
> library was in a safe state (tapes loaded but Networker down). When the
> Windows server came back up, we saw activity on the tape drive (the
> blinking light indicating that the tape was moving). We then disabled
> this service and did another reboot. This time when the server booted,
> there was no tape activity.
>
> Good luck.
> Teresa
>
>
> -----Original Message-----
> From: Legato NetWorker discussion [mailto:NETWORKER AT LISTMAIL.TEMPLE DOT
> EDU]
> On Behalf Of John Herlihy
> Sent: Thursday, January 22, 2004 5:31 AM
> To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
> Subject: Re: [Networker] SDLT320 unreadable tapes
>
> Sorry - the environment has a mix of Solaris 8, Tru64 v5.1A/B and
> Windows 2000 boxes all sharing various drives in the same library across
> a fibre channel (Brocade) SAN. All drives are connected to ADIC SNC5000
> scsi-fibre bridges, which are then connected to the SAN. The Server is a
> Tru64 v5.1B system.
>
> I don't think resets are the problem as I'd be seeing errors in the
> system logs if that were the case. Legato seem to think that some other
> application is overwriting the head of the tape, but if that were the
> case then I don't think I'd be able to pull a partial label from the
> header of the tape.
>
> Cheers,
> John
>
> -----Original Message-----
> From: Mark Bradshaw (BTOpenWorld)
> [mailto:notthehoople AT BTOPENWORLD DOT COM]
> Sent: Wed 21/01/2004 7:32 AM
> To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
> Cc:
> Subject: Re: [Networker] SDLT320 unreadable tapes
>
>
>
> Hi John,
>
> I'm a bit confused here. Your scanner shows you are using
> /dev/rmt/0cbn as a
> remote device on a Storage Node but you don't mention this in
> your
> environment. Is /dev/rmt/0cbn the problem device and if so are
> you running
> scanner on a Solaris Storage Node?
>
> Ah - looking a bit closer the prompt you are running scanner on
> is
> 'osun5680' so I guess it is a Solaris box. Can you flesh out
> your
> environment for us please?
>
> Also could it be that you are sharing SDLT drives between
> Solaris and Tru64
> in a SAN and are maybe suffering from SCSI resets?
>
> Some random thoughts!
>
> Cheers
>
> Mark
>
> > nah - I'm using /dev/ntape/tape#_d1 which is the non-rewind
> device. If that
> > was it then all tapes would be affected.
> >
> > Also - a weird thing is that the sus tapes will mount fine and
> accept backups
> > fine, but as soon as you remove them from the media db (or try
> to do a
> > restore), then you're unable to read the header information on
> the tape.
> >
> > -----Original Message-----
> > From: Davina Treiber [mailto:Treiber AT hotpop DOT com]
> > Sent: Mon 19/01/2004 6:53 PM
> > To: Legato NetWorker discussion; John Herlihy
> > Cc:
> > Subject: Re: [Networker] SDLT320 unreadable tapes
> >
> >
> >
> > I haven't worked with Tru64 for a while so the device naming
> conventions
> > aren't fresh in my mind, but is it possible that in some way
> you are
> > using rewind devices? That would account for the abnormally
> high amounts
> > of data written to some volumes, and could also account for
> the
> > corruption you are seeing. Of course if this is the case it's
> bad news
> > in terms of recovering your data. Just a thought...
> >
> > John Herlihy wrote:
> >> Hi,
> >>
> >> sorry for the length of this email, but figured I'd chuck all
> the info
> >> in there now - I am seeing an issue where it looks like the
> Networker
> >> headers on the tape are incomplete.
> >>
> >> This is the environment:
> >> Tru64 v5.1A
> >> Networker Power Edition v6.1.3
> >> SDLT320 drives
> >>
> >> When trying to use "scanner -i <device>" to scan a tape back
> in it:
> >> 1 - prompts for you to enter in the name of the volume.
> >> 2 - complains that there is no pool named `'
> >> 3 - fails in a short amount of time (ie about 5-10 secs)
> >>
> >> Here is the scanner output:
> >> =================================================
> >> osun5680[/]# scanner -s nsr01 -vim /dev/rmt/0cbn
> >> scanner: using 'rd=server1:/dev/rmt/0cbn' as the device name
> >> scanner: Opened /dev/rmt/0cbn for read
> >> scanner: Rewinding...
> >> scanner: Rewinding done
> >> scanner: Reading the label...
> >> scanner: Reading the label done
> >> scanner: SYSTEM error: Tape label read: Bad file number
> >> scanner: SYSTEM error: Tape label read: Bad file number
> >> scanner: scanning for valid records...
> >> scanner: read: 131072 bytes
> >> scanner: read: 131072 bytes
> >> scanner: Found valid record:
> >> scanner: volume id 2434907393
> >> scanner: file number 110
> >> scanner: record number 5930
> >> scanner: Enter the volume's name: SU0026
> >> scanner: volume name `SU0026'
> >> scanner: scanning sdlt320 tape SU0026 on
> rd=server1:/dev/rmt/0cbn
> >> scanner: volume id 2434907393 record size 131072
> >> created 1/01/70 10:00:00 expires 1/01/70 10:00:00
> >> scanner: adding sdlt320 tape SU0026 to pool
> >> scanner: RAP error: There is no pool named `'.
> >> scanner: create pool manually after scanner; continuing...
> >> scanner: Rewinding...
> >> scanner: Rewinding done
> >> scanner: setting position from fn 0, rn 0 to fn 2, rn 0
> >> scanner: Opened /dev/rmt/0cbn for read
> >> scanner: unexpected file number, wanted 2 got 112
> >> scanner: adjusting file number from 2 to 112
> >> scanner: scanning file 112, record 0
> >> scanner: unexpected volume id, wanted 2434907393 got
> 2434907393
> >> scanner: Opened /dev/rmt/0cbn for read
> >> scanner: done with sdlt320 tape SU0026
> >> scanner: Rewinding...
> >> scanner: Rewinding done
> >> =================================================
> >>
> >> We were able to obtain the header from the tape via the
> command:
> >> dd if=/dev/rmt/0cbn of=/tmp/tapeheader bs=128k count=1
> >>
> >> ..and then view it with the command:
> >> strings /tmp/tapeheader
> >>
> >> Here is the output from 2 problem tapes:
> >> =================================================
> >> For volume SU0026:
> >> VOL1SU0026NETWORKER
> 3
> >> setting position from fn %lu, rn %lu to fn %lu,
> >>
> >> For volume SU0116:
> >> VOL1SU0116NETWORKER
> 3
> >> setting position from fn %lu, rn %lu to fn %lu,
> >>
> >> =================================================
> >>
> >> This is what the header of a good tape looks like:
> >> =================================================
> >> VOL1SU0295NETWORKER
> 3
> >> setting position from fn %lu, rn %lu to fn %lu,
> >> C%D2
> >> SU0295
> >> volume pool
> >> SCRATCH
> >> =================================================
> >>
> >> ALSO - I'm also seeing that an abnormal amount of data is
> being written
> >> to these tapes via the "mminfo -m" output. I don't know about
> SU0026 as
> >> it's already been deleted from the media db, but SU0116 has
> 1202GB on
> >> it!! I've looked through the mminfo output and found other
> tapes which
> >> have between 500GB-1848GB!!!
> >>
> >> I've checked four of these tapes which contained 1848GB,
> 921GB, 671GB &
> >> 1700GB respectively, and only the 671GB tape was able to be
> read.
> >>
> >> I used "tcopy" to get a listing of the tapes structures, and
> the one
> >> that worked had 2 x 32KB header files before changing to
> 128KB data
> >> blocks while the other 3 only had 128KB blocks.
> >>
> >> I'm investigating driver versions at the moment, but can
> anyone think of
> >> what could be causing this? There doesn't appear to be any
> common
> >> trigger (Windows & Unix systems are affected across multiple
> drives...
> >> firmware has been upgraded, etc).
> >>
> >
> >
> >
>
> --
> Note: To sign off this list, send a "signoff networker" command
> via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
>
>
> --
> Note: To sign off this list, send a "signoff networker" command via email
> to listserv AT listmail.temple DOT edu or visit the list's Web site at
> http://listmail.temple.edu/archives/networker.html where you can
> also view and post messages to the list.
> =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|