Hello,
If you are ejecting tapes directly from Tape Lib and having problems, that
would certainly indicate problems with drives\hardware on EML Lib.
Have you checked with HP support ?
Couple of questions
Is Block size the same between both servers ? How often do you clean your
drives ?
Are you using same media type and interchange them between the tape
libraries, i believe ADIC uses IBM & HP uses it's own Tape Drives, by
itself not a problem but if you keep mixing media ?? then it could act
funny ?
Lastly, have you checked any errors logs on EML & Driver\Frimware ?
HTH
Alex Alexiou <AAlexiou AT TARGETSITE DOT COM>
Sent by: EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>
06/09/2008 01:06 PM
Please respond to
EMC NetWorker discussion <NETWORKER AT LISTSERV.TEMPLE DOT EDU>; Please respond
to
Alex Alexiou <AAlexiou AT TARGETSITE DOT COM>
To
NETWORKER AT LISTSERV.TEMPLE DOT EDU
cc
Subject
[Networker] tape drives getting "stuck" on storage node
Here's the background: We have Networker 7.3.3 running on a backup
server and a storage node server, both Red Hat AS 4. The backup server
is fibre-attached to an ADIC Library, the storage node to an HP EML
Library, via QLogic HBA's.
Over the past month or so, during weekend backup jobs, random tape
drives in the EML will get "stuck". They will load a tape and keep
trying forever to inventory the tape and never finish. This never
happens on the ADIC. Only power-cycling the EML seems to fix this. It's
not the same drives every time and it's only during weekends. I tried
disabling CDI on the drives, which only seemed to help slightly. Just
this past weekend, a drive was stuck while trying to eject a tape and
kept timing out, with the following in /nsr/logs/daemon.log on the
storage node:
06/09/08 00:50:07 nsrd: media warning:
rd=tgtbackupnode01.targetsite.com:/dev/nst4 moving: eject: Input/output
error
I even tried opening the jukebox up and manually ejecting the tape, and
nothing happened. Again, I had to reboot the EML and everything was
fine.
It's possible it's the EML at fault and not Legato, but it's impossible
to tell right now. Has anyone seen anything like this where a change in
Legato fixed things? One thing I noticed is that the kernel version of
the backup server and storage node is slightly different; I was never
told to keep it the same, but I saw a posting that referring to this as
a possible problem.
Another thing I noticed was several entries in the storage node's log
like this. Server name was removed by me:
06/07/08 01:00:29 nsrexecd: GSS Legato authentication user session entry
(warning): "User authentication session timed out and is no
w invalid.". Session number = 4a8:1008, domain = NT AUTHORITY, user name
= SYSTEM, NetWorker Instance Name = server
06/07/08 01:00:29 nsrexecd: GSS Legato authentication user session entry
(warning): "User authentication session timed out and is no
w invalid.". Session number = 4a9:1009, domain = NT AUTHORITY, user name
= SYSTEM, NetWorker Instance Name = server
06/07/08 01:01:05 nsrexecd: SYSTEM error: An error occured when a client
attempted to acquire credentials: error: "A daemon requeste
d the information for a user session, but the user session was not found
in the list of valid sessions" session number: 465:224c, cl
ient ip address: 127.0.0.1, port number: 0, user id: (NONE).
06/07/08 01:01:05 nsrmmd #11: GSS Legato authentication from server
failed...
06/07/08 01:01:05 nsrmmd #11: RPC error: Authentication error
06/07/08 01:01:05 nsrexecd: SYSTEM error: An error occured when a client
attempted to acquire credentials: error: "A daemon requeste
d the information for a user session, but the user session was not found
in the list of valid sessions" session number: 466:2250, cl
ient ip address: 127.0.0.1, port number: 0, user id: (NONE).
We also often have errors like this in dmesg on the storage node:
st6: Failed to read 131072 byte block with 32768 byte transfer.
st4: Error 20000 (sugg. bt 0x0, driver bt 0x0, host bt 0x2).
st2: Failed to read 131072 byte block with 32768 byte transfer.
Let me know if any more information would help.
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type
"signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|