Networker

[Networker] How to get around block size errors?

2005-11-17 19:46:40
Subject: [Networker] How to get around block size errors?
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 17 Nov 2005 19:43:42 -0500
Is there any recommended way to recover data when you get a block error during 
nwrecover?

nsrd: media notice: Volume "volname" on device "rd=storagenode:/dev/nst2": 
Block size is 512 bytes not 65536 bytes. Verify the device configuration. Tape positioning by 
record is disabled.

At that point, the recovery just hangs as the tape sits idle. We see this a 
lot. Sometimes, the error occurs during backups and sometimes during 
recoveries, but not always. Seems random., but seems to occur about 40% of the 
time we need to recover data. Yikes!!!

Whenever I've scanned one of these tapes, to see the actual block size, the 
reported block size is always what NetWorker *wants* or is expecting, not what 
it complained about. The LTO 1 tapes are always 64 KB and the SDLT 1 tapes are 
always 128 KB. Also, checking the devices under the GUI shows these respective 
sizes. Not sure why NetWorker is getting confused.

Is there anything that can be done to at least temporarily allow you to get 
around this problem? Anything to provide relief other than to pack up and move 
away to a far off land with no supper?

We were told by Legato tech support (we sent them our log files) that 
engineering said there were issues that were fixed in later releases, but no 
clearer answers beyond that -- nothing specific regarding the actual block 
problems/errors themselves and how to resolve them. We have received a new 
media kit (7.2.1) and plan to migrate to a new server to run this, but we can't 
do it for a few more weeks. Has anyone seen the same problems go away after 
upgrading to a more recent release? I know there have been a lot of reported 
block problems with Windows OS.

Currently, we're running an old release 6.1.1 on Solaris 2.8. but the devices 
we're using are managed by a single RedHat Linux storage node (6.1.1). The 
devices are SDLT 1 and LTO 1 drives. The storage node has 2 libraries, one with 
LTO drives, one with SDLT. We've seen these problems on all the devices, 
though, and I think this problem has been persistent for a long time. We use 
LSI logic SCSI cards. We're using Fuji and Maxell tapes.

We have what we think is a properly configured stinit.def file (see below), but 
if there's anything we can do in the mean time to provide some relief, with 
minimal impact, or anything we can try, that would be nice. I hope these 
problems go away with the new release. I wonder, though, about accessing tapes 
that were labeled and written with our current release?

# Seagate Ultrium LTO
manufacturer=SEAGATE model = "ULTRIUM06242-XXX" {
scsi2logical=1 can-bsr auto-lock
mode1 blocksize=0
}

# SDLT220
manufacturer="QUANTUM" model = "SuperDLT1" {
scsi2logical=1
can-bsr=1
auto-lock=0
two-fms=0
drive-buffering=1
buffer-writes
read-ahead=1
async-writes=1
can-partitions=0
fast-mteom=1
#
# If your stinit supports the timeouts:
timeout=3600 # 1 hour
long-timeout=14400 # 4 hours
#
mode1 blocksize=0 density=0x48 compression=1    # 110 GB + compression
mode2 blocksize=0 density=0x48 compression=0    # 110 GB, no compression
}


Thanks in advance.

George

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>