Networker

[Networker] Odd nsrd crash after savegrp cancel - reading empty tape drive

2006-03-23 04:05:27
Subject: [Networker] Odd nsrd crash after savegrp cancel - reading empty tape drive
From: T S Kimball <t.s.kimball AT GMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 23 Mar 2006 03:47:39 -0500
We had a rather unusual experience here tonight.  Searching the archive is
not finding anything specific to this, and calls to both Sun and Legato are
pending.

Pool ran out of tapes during a backup.  It paged me (a bit late) while I was
at home, I logged in to see which group it was.  Not an important one, so
cancelled the group pending a fix and restart later.  The tape drive (on a
storage node) had already ejected the now-full tape; This was confirmed in
daemon.log.

Ten minutes later, nsrd core dumps.  The last messages in log is for it
failing to read from that same tape drive on the storage node (huh?).  This
is an AlphaStor library, so I checked that log; No drives on the node were
requesting load or unload at that time - three empty drives.

Anyway, a restart of Networker was needed, and things have been relatively
happy since (three DLT drives that were spinning at the time have gone
'sour' and are being rotated out - we have enough spares).

However, I'm very concerned about this.  It's the first time I've
experienced it, and though its likely just a fluke I'm looking for any input
as to what the general cause may be (so we don't repeat it).  I've run
across other odd situations that make me feel its related to not enough CPU
resources for the nsrd process, but can't readily prove it.

Specs:
  Server - Sun E450 (4x480 MHz), Solaris 8, Sun EBS 7.1.3, 4 x DLT7000 (only
three enabled right now), library control (SCSI), gigabit (fiber).
  SN1 - Sun V240 (2x1.2Ghz), Solaris 8, Sun EBS 7.1.3 SN, AlphaStor Server,
2 x LTO-2 (fiber) and 1 x DLT7000, 1 network port enabled in gigabit mode.
  SN2 - Sun V880 (8x900 Mhz), Solaris 8, Sun EBS 7.1.3 SN, 3 x LTO-2
(LVD-SCSI).

We also have some large adv_file disk on the Server and SN1, but it was not
in use at the time and this config has been stable for awhile.

Thanks in advance,
--TSK

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>