[Networker] How to get around block size errors?

Is there any recommended way to recover data when you get a block error during
nwrecover?

nsrd: media notice: Volume "volname" on device "rd=storagenode:/dev/nst2":
Block size is 512 bytes not 65536 bytes. Verify the device configuration. Tape positioning by
record is disabled.

At that point, the recovery just hangs as the tape sits idle. We see this a
lot. Sometimes, the error occurs during backups and sometimes during
recoveries, but not always. Seems random., but seems to occur about 40% of the
time we need to recover data. Yikes!!!

Whenever I've scanned one of these tapes, to see the actual block size, the
reported block size is always what NetWorker *wants* or is expecting, not what
it complained about. The LTO 1 tapes are always 64 KB and the SDLT 1 tapes are
always 128 KB. Also, checking the devices under the GUI shows these respective
sizes. Not sure why NetWorker is getting confused.

Is there anything that can be done to at least temporarily allow you to get
around this problem? Anything to provide relief other than to pack up and move
away to a far off land with no supper?

We were told by Legato tech support (we sent them our log files) that
engineering said there were issues that were fixed in later releases, but no
clearer answers beyond that -- nothing specific regarding the actual block
problems/errors themselves and how to resolve them. We have received a new
media kit (7.2.1) and plan to migrate to a new server to run this, but we can't
do it for a few more weeks. Has anyone seen the same problems go away after
upgrading to a more recent release? I know there have been a lot of reported
block problems with Windows OS.

Currently, we're running an old release 6.1.1 on Solaris 2.8. but the devices
we're using are managed by a single RedHat Linux storage node (6.1.1). The
devices are SDLT 1 and LTO 1 drives. The storage node has 2 libraries, one with
LTO drives, one with SDLT. We've seen these problems on all the devices,
though, and I think this problem has been persistent for a long time. We use
LSI logic SCSI cards. We're using Fuji and Maxell tapes.

We have what we think is a properly configured stinit.def file (see below), but
if there's anything we can do in the mean time to provide some relief, with
minimal impact, or anything we can try, that would be nice. I hope these
problems go away with the new release. I wonder, though, about accessing tapes
that were labeled and written with our current release?

# Seagate Ultrium LTO
manufacturer=SEAGATE model = "ULTRIUM06242-XXX" {
scsi2logical=1 can-bsr auto-lock
mode1 blocksize=0
}

# SDLT220
manufacturer="QUANTUM" model = "SuperDLT1" {
scsi2logical=1
can-bsr=1
auto-lock=0
two-fms=0
drive-buffering=1
buffer-writes
read-ahead=1
async-writes=1
can-partitions=0
fast-mteom=1
#
# If your stinit supports the timeouts:
timeout=3600 # 1 hour
long-timeout=14400 # 4 hours
#
mode1 blocksize=0 density=0x48 compression=1 # 110 GB + compression
mode2 blocksize=0 density=0x48 compression=0 # 110 GB, no compression
}

Thanks in advance.

George

To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu
if you have any problems
wit this list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER