Veritas-bu

[Veritas-bu] Media Position Errors on Restores - SunV880/HP20-700/NB3.4_3/SSO

2002-12-13 16:12:24
Subject: [Veritas-bu] Media Position Errors on Restores - SunV880/HP20-700/NB3.4_3/SSO
From: gbos AT uoguelph DOT ca (Gerrit Bos)
Date: Fri, 13 Dec 2002 16:12:24 -0500
Steven (and list),
I still have this issue.  I did have good (but not 100%) success with restores 
after they failed with code 86 by setting the SIZE_DATA_BUFFERS to match the 
reported blocksize in the bptm log of a failed restore. Other flags suggested 
by Veritas support which I did not have much success with:
/usr/openv/netbackup/db/config/NO_POSITION_CHECK
/usr/openv/volmgr/database/NO_LOCATEBLOCK
/usr/openv/netbackup/db/config/FIXED_LENGTH_BLOCK
They are system-wide, and I do not know of the impact.  I tested them only on 
servers when there were no other backups or restores running.

The problem seems temporarily under control with variable blocks (no 
SIZE_DATA_BUFFERS) and earlier was also less of a problem with 
SIZE_DATA_BUFFERS set to 64K blocks.

I closed the problem with Veritas support once I proved to myself that a dd to 
the drive with hardcoded block-size in fact didn't write with that block-size:
------------------------------------------------------------------------
# ls -l /tmp/tapetest.tar
-rw-r--r--   1 gbos     other    51138560 Sep  4 11:06 /tmp/tapetest.tar
# dd if=/tmp/tapetest.tar of=/dev/rmt/4cb bs=256k
195+1 records in
195+1 records out
# tcopy /dev/rmt/4cb
file 1: records 1 to 780: size 65536
file 1: record 781: size 20480
file 1: eof after 781 records: 51138560 bytes
eot
total length: 51138560 bytes
#
---------------------------------------------------------------------------

My environment is direct attach rather than SSO:

- Netbackup 3.4 patched to NB_34_3 on a Sun V880 running Solaris 8. 108528-13
st patch 108725-11
st.conf at the time:
"HP      Ultrium", "HP Ultrium from st", "ULTRIUM";
ULTRIUM =       1,0x36,0,0x19639,4,0x00,0x00,0x00,0x00,3;
The lastest patch bundle has once again overwritten it, and there currently is 
no entry.  The "from st" was added to prove to Veritas support that the entry 
was being called, since that pretty print field appears in dmesg after a 
reinitialization.

- Library is an HP 20/700, Model number A5597A and is at firmware revision 
3.01.02
- Drive are HP LTO/Utrium 1 SCSI Model A6323A just updated to a prerelease 
firmware at least at E258
- Bridges are HP Model A4688A with firmware 2008r (prerelease, just updated 
Tuesday)
The HBAs in the Sun server are two single PCI FC adapters Sun option #X6799A
(Top Level # 595-5830; Mfg # 375-3019; with the ISP 2200 chip)

Given that we have different HBA's and bridges, that might be an indication 
that the resolution lies at either end: Solaris tape driver or HP drive 
firmware.  A dd followed by a tcopy would be an interesting test on your 
system, and probably points to Solaris as the best place for a solution.

The problem right now is on my back burner while I am resolving tape 
reassignment problems with this library, which HP has just agreed (after quite 
a few discussions) would be more thoroughly solved by persistent binding rather 
than leaving them to autodetect.

I am very interested in the resolution, since it will return to my front-burner 
when my current problem is resolved. We were at 256k blocks before, and I'd 
like to go back to that.  But it sure is good to see I am not the onlty one 
with the problem!
I hope this helps.....Gerrit

"Zunker, Steven (STP)" wrote:

> All,
>
> I was hoping someone could provide some insight into a big problem we are 
> having. We recently migrated to an HP20-700 tape library with HP Ultrium LTO 
> drives in a SAN environment. We are currently seeing media position errors 
> (86) on restores (or duplications), the backups complete with no errors. If 
> we do see an error, it will be at the beginning of the restore - with no data 
> being restored. We see media position errors on ~1 out of 15 images written. 
> We cannot trace the error down to a single drive, tape, hba, switch, bridge, 
> or time of day. We also do not SEEM to see this with our W2000 media server 
> that shares these same tape drives. We did pretty substantial testing 
> (backups,restores,duplications) and did not see any errors prior to cutover, 
> but we could not test a full production hit. I have cases open with Veritas, 
> Sun, and HP, but no resolution yet. So far we have mostly worked with the Sun 
> tape driver (st.conf) and OS patches. Here is a little on our environment:
>
> Master Server
> Sun V880 Solaris 8(108528-15)
> st patch 108725-11
> st.conf entry   "HP Ultrium",   "HP Ultrium",   "Ultrium";
>                         Ultrium         =       
> 1,0x36,0,0x19639,4,0x00,0x00,0x00,0x00,3;
> 2 x Emulex LP9002L HBA's Firmware v3.90a7
> Netbackup 3.4_3
> SSO
> /usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS = 16
> /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS = 131072
>
> Tape Library
> HP 20/700 Firmware v3.01.02
> 10 x HP Ultrium LTO drive Firmware v258
>
> Bridge
> HP Interface Manager Firmware v3010
> 5 slots (2x drives per slot)
>
> Switch
> Brocade 3800
> OS v3.0.2f
> Has anyone out there seen or resolved this type of error?? Any information 
> would be helpful.
>
> Thanks,
>
> Steve Zunker
> UNIX Administrator
> Guidant Corporation
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu



<Prev in Thread] Current Thread [Next in Thread>