Networker

Re: [Networker] Need further assistance in LTO errors

2003-10-01 05:14:58
Subject: Re: [Networker] Need further assistance in LTO errors
From: Riaan Louwrens <riaanl AT SOURCECONSULTING.CO DOT ZA>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Wed, 1 Oct 2003 11:23:50 +0200
Hi,

This is just a shot in the dark - BUT - I have seen some similar problems
and error messages where the SCSi ID's to the drives have changed for some
reason (san connected library, 2 HBA's for failover etc etc).

It arose every time either the library was booted or the backup server. The
HBA's would then come upand "assign" a different ID to the drive (path to
drive).

The quickest way I know of to test for this is to (this is where I shudder
as I have done this countless times) -

Copy your nsrjb.res somewhere safe.
I then deleted the library and only used the "devices".
I then manually put a tape in each drive.
>From Legato I then try and mount the tape - mount drive "1" and then watch
to see which physical drive's light start flashing.

Normally the drives that I was experiencing the read open errors  on would
be culprits...

Then by using "inquire" you take note of which ID is assigned to which drive
(and the physical location thereoff) and then do a new jbconfig (shudder
again) to put both the drive order (must correlate with what the library
thinks is drive 0, 1, 2 ...) and the associated "OS name" (i.e. 0cbn or
\\.\Tape0).

RISK / PROBLEM: if you have a large library with tonnes of slots this could
take a while to re-inventory etc etc.

Best of luck,

Riaan

-----Original Message-----
From: Frank Altpeter [mailto:f.altpeter AT BROADNET-MEDIASCAPE DOT DE]
Sent: Wednesday, October 01, 2003 11:04 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: [Networker] Need further assistance in LTO errors


Hi again,


i'm sorry to disturb the list again with this, but i don't have
found any solution yet for my problem.


I get (occationally) the following scenario:


Wed 10:43:10 media info: suggest mounting 000110 on r2d2 for writing to pool
'Offsite Clone'
Wed 10:43:11 media waiting event: Waiting for 1 writable volumes to backup
pool 'Offsite Clone' tape(s) or disk(s) on r2d2
Wed 10:43:12 media waiting event: waiting for LTO Ultrium tape 000160 on
r2d2
Wed 10:43:40 media alert: setting (LTO Ultrium) tape file size to (100000
blocks)
Wed 10:43:41 write completion notice: Writing to volume 000160 complete
Wed 10:44:42 media warning: /dev/nst1 opening: Device or resource busy
Wed 10:44:42 /dev/nst1 read open error, Device or resource busy
Wed 10:45:13 media alert: setting (LTO Ultrium) tape file size to (100000
blocks)
Wed 10:45:16 /dev/nst4 Eject operation in progress
Wed 10:46:48 /dev/nst4 ejected
Wed 10:47:05 media info: loading volume 000110 into /dev/nst4
Wed 10:47:16 /dev/nst4 Verify label operation in progress
Wed 10:48:16 media warning: /dev/nst4 opening: Device or resource busy
Wed 10:48:16 media warning: /dev/nst4 reading: read open error, Device or
resource busy


Then, for very long time, twice every minute, the following:

Wed 11:00:07 media info: unload retry for jukebox `L700e' failed - will
retry again.
Wed 11:00:07 media info: unload retry for jukebox `L700e': sleeping 30
seconds



After that, it takes a long time for the system to calm down and
work properly again. I am completely clueless how this could caused.


I have tried to tune the timeout values of the jukebox, but without
any effort. I changed from Jukebox internal cleaning to NetWorker
based cleaning to be sure that it doesn't have anything to do with
the cleaning, but that wasn't the cause, too.

The first 3 tapes (nst0 - nst2) are daisy-chained to an Adaptec
29160, the two others (nst3 + nst4) the same way to a second Adaptec
card. The jukebox control is on the third Adaptec card.

The system is running RedHat linux on 2.4.22 kernel.



With kind regards,

        Frank Altpeter

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>