Networker

[Networker] corrupt media databases

2002-10-06 00:28:26
Subject: [Networker] corrupt media databases
From: Kevin Maguire <kmaguire AT ESO DOT ORG>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Sun, 6 Oct 2002 06:28:15 +0200
HI

I am running Networker 6.0.3, stk9710 + m1500 jukeboxes, on a Solaris
2.6 server.

My Solaris system crashed badly, no log entry at all, no crash dump,
but it died.  Some sort of kernel was still running, I could ping the
system and it showed connected, but no network or console access was
allowed - it was just hanging.  Hard power-cycle seemed the only
option.  Reason for that is unclear, but anyway on reboot I could not
start networker, message was:

10/05/02 16:53:57 nsrd: server notice: started
10/05/02 16:54:03 nsrmmdbd: WISS error: Unable to mount /nsr/mm/mmvolume6: bad 
database header
10/05/02 16:54:03 nsrmmdbd: media db must be scavenged
10/05/02 16:54:12 nsrmmdbd: media db scavenge successful
10/05/02 16:54:12 nsrmmdbd: WARNING: clients file missing from /nsr/mm/mmvolume6
10/05/02 16:54:13 nsrmmdbd: error adding btrees to ss (an invalid slot number)
10/05/02 16:54:13 nsrmmdbd: WISS error: an invalid slot number
10/05/02 16:54:28 nsrd: nsrmmdbd has exited with status 1
10/05/02 16:54:28 nsrd: shutting down

...

mminfo -m showed I had no media!  I stopped networker, removed
/nsr/tmp and re-created it, and tried again.  Did not help.

Trawling through this lists archives I came to the conclusion I needed
to recover with mmrecov.  I'm not sure it was the right conclusion, but
it was what I did.

This went OK in some sense, but my last bootstrap was from 48 hours
ago.  I *know* that a lot of successful save sets were written after
that but before the system crash, I got the savegrp completion
e-mails.  However now I cant see them, as my media database is
restored to what it was 48 hours ago!

Is this the best I can expect?

I saw from my logs that about 10 volumes were relabelled in the
time between bootstrap and crash, messages like:

10/04/02 18:24:58 nsrd: deleted media notice: Deleted volume: volid=712010241, 
volname=000079, location=STK9710

These were indeed recyclyable volumes, if I do

mminfo -q volid=712010241
 or
mminfo -q volname=000079

I see the savesets, all marked as recyclable, from months ago.
However I know that volume now contains new savesets, but I dont know
how to tell legato to read it back in.

I tried using scanner - but when I load the volume with nsrjb it just
spits it back out saying the volume is not part of the media database.
I know that, it is the media database I want to fix!!

Anyway, I am concerned that my other databases are now confused, as I
see a message saying

10/04/02 20:30:45 nsrd: ftp-server:/ftphome done saving to pool 'Default' 
(000079) 500 MB

So the save had completed, but my media record of where it is is gone,
except telling me it was on a volume that Legato thinks contains data
from months ago!

Suggestions most welcome ....?

Cheers
Kevin

--
Note: To sign off this list, send a "signoff" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>