Networker

[Networker] Possible Storage Node disconnect - or Auth Issue?

2010-11-29 01:44:11
Subject: [Networker] Possible Storage Node disconnect - or Auth Issue?
From: tkimball <networker-forum AT BACKUPCENTRAL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 29 Nov 2010 01:41:53 -0500
Morning all,

Sun EBS (Networker) 7.4.4 Server/SN/NMC, 7.4.x clients.  All Solaris/Sparc 10 
(for the affected hosts).

We had a problem with certain backups to one of our tape pools last Friday, 
most of which were recovered OK (thankfully these were non-critical).

I'm not seeing any specific logs that would determine if a tape was at fault 
(the normal case in our experience), but instead I have these errors in the 
storage node's daemon.raw file:

-----
MsgID TimeStamp Severity Category ErrorNo ProcessID HostName ProgramName 
RenderedMessage
39078 11/26/10 19:42:06  0 0 2 2316 host1 nsrexecd SYSTEM error: There is 
already a machine using the name: "syb". Either choose a different name for 
your machine, or delete the "NSR peer information" entry for "syb" on host: 
"host1"
39078 11/26/10 20:10:20  0 0 2 2316 host1 nsrexecd SYSTEM error: There is 
already a machine using the name: "syb". Either choose a different name for 
your machine, or delete the "NSR peer information" entry for "syb" on host: 
"host1"

67209 11/26/10 23:13:26  2 0 0 2316 host1 nsrexecd An internal tracking event: 
GSS Legato authentication user session entry (warning): "User authentication 
session timed out and is now invalid.". Session number = 1:1, domain = 165607, 
user name = 590, NetWorker Instance Name =
67209 11/26/10 23:43:26  2 0 0 2316 host1 nsrexecd An internal tracking event: 
GSS Legato authentication user session entry (warning): "User authentication 
session timed out and is now invalid.". Session number = 1:1, domain = 165616, 
user name = 1725213, NetWorker Instance Name =
-----

The first group of errors have shown up several times in the past year, but not 
affected anything that I know about (clarifying what its saying would be 
appreciated though).

I'm more concerned with the second set of errors, we got a LOT of those clear 
through to late next morning.

And the only other hints I have are:
- a tape in the affected pool left in a drive, not ejected (had to force that 
from the GUI, but it did come out OK)
- all the affected clients are pointed to the Node in question
- all but one of the clients are on a different subnet than the Node in question

There is nothing telling in the main server's daemon.log, unfortunately.  
Usually its very clear that a tape has gone south, and I just take it out of 
rotation; No luck here, though I did take the tape left in the drive out of 
rotation (for safety).

Suggestions?

--TSK

-----=====-----
Tim Kimball   --   http://sungak.net
-----=====-----

+----------------------------------------------------------------------
|This was sent by t.s.kimball AT gmail DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>