Re: [Networker] Limit on number of savesets in an nsrclone?
2008-03-11 16:01:22
Hi Ian,
Check to see whether the saveset reported as missing actually can
be found under a /_AF_readonly component of the volume. If so, then
you have a known bug whereby nsrd/nsrmm can accidentally generate
data to the "wrong side" of the device. Because it shouldn't go
there, then NetWorker can't find it when the saveset access is
attempted.
No, I can't see them there: all the savesets that are missing are
from around the time we had an array crash and restart so I'm
suspecting it's a storage problem. But I'll watch out for the
scenario you describe: it rings bells for other problems.
Failing ~300 savesets because one of them is unreadable is a bit
naughty, too.
It's the nature of nsrmmd unfortunately. Basically it gets told by
nsrclone to read all those savesets, one fails, so the atomic activity
is considered a failure. You might also want to check to see if nsrmmd
is coredumping; check /nsr/cores/nsrmmd to see if you've got core
dumps from around the time the failed reads are occurring.
I've seen a similar problem caused by array crashes/connectivity
losses. One way to check in advance is to do the following:
find /path/to/dbu -type f -print > /tmp/results.txt
for lssid in `mminfo -q "volume=dbu.RO" -r "ssid(60)"`
do
echo $lssid `grep -c $lssid /tmp/results.txt`
done
That'll check every ssid that NetWorker _thinks_ is on the disk backup
unit. You should see output along the lines of say:
e65b3ac2-00000006-1bc0154e-47c0154e-00e60000-c0a86404 2
Which is the long ssid and the number of times it appears on the DBU.
There should be 2 instances; the actual saveset and the note for the
saveset. If you have an instance that reports a count of 0, then
NetWorker thinks that it is on the disk, but it isn't. You can then
use nsrmm to delete the saveset. To identify the short ssid,cloneid
combo you could then run:
mminfo -q "ssid=e65b3ac2-00000006-1bc0154e-47c0154e-00e60000-c0a8640" -
r volume,ssid,cloneid
And delete the instances for the DBU only (nsrmm -d -S ssid/cloneid).
You could make the script smarter, etc., but since I'm on a train with
variable service, I'll leave that as an exercise for the reader :-)
Cheers,
Preston.
--
Preston de Guise
"Enterprise Systems Backup and Recovery: A Corporate Insurance
Policy", due out August 15 2008:
http://www.crcpress.com/shopping_cart/products/product_detail.asp?sku=AU6396&isbn=9781420076394&parent_id=&pc=
To sign off this list, send email to listserv AT listserv.temple DOT edu and type
"signoff networker" in the body of the email. Please write to networker-request
AT listserv.temple DOT edu if you have any problems with this list. You can access the
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
|
|
|