Networker

Re: [Networker] Limit on number of savesets in an nsrclone?

2008-03-10 15:55:11
Subject: Re: [Networker] Limit on number of savesets in an nsrclone?
From: Preston de Guise <enterprise.backup AT GMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 11 Mar 2008 06:51:27 +1100
Hi Ian,

A look at /var/nsr/logs/daemon.log would have been a good idea. I can now see what's happening, the question is why it's happening...

I tracked it down. Somehow (and there was some disturbance in the storage a week ago, so it's not impossible) one saveset out of 400 in the volume was missing. This in turn caused the whole clone or stage operation to be aborted with extreme prejudice, without doing any clones. I found it by staging the savesets one at a time, until one of them failed, then (after checking it was a saveset I could live without, but as it was missing anyway that was rather moot) using nsrmm -d to delete the one that failed. Then the whole thing would stage successfully.

Interestingly, scanner -i ran successfully: I presume it doesn't purge from the media database savesets which are in the database as being on the volume but which scanner doesn't find. Another route to achieve the same result would be to delete all savesets that are on the volume, then scan the volume in.

Check to see whether the saveset reported as missing actually can be found under a /_AF_readonly component of the volume. If so, then you have a known bug whereby nsrd/nsrmm can accidentally generate data to the "wrong side" of the device. Because it shouldn't go there, then NetWorker can't find it when the saveset access is attempted.\

03/10/08 13:38:20 nsrd: media info: nsrmmd #3 on backup- srv.ftel.co.uk started a
s requested
03/10/08 13:38:41 nsrmmd #20: filesys_retrieve: failed to read: cannot open /var /remotestage/incrementals/offsite-4/85/38/8f2df281-00000006- ddc77fdd-47c77fdd-01
2e0000-ac100301 file: No such file or directory

If the data is being written to the wrong side of the DBU, then there is a fix you can get from EMC for nsrmm and nsrd - I've had to apply it several times to Linux customers, but never yet before for Solaris customers.

A previous posting I made covers what I could tell about the bug, but I've repeated it below:

----------
In order to resolve this, EMC support had to provide a revised nsrmmd and nsrd. Further, each disk backup unit had to have the following actions performed once the revised nsrmmd and nsrd were supplied:

(1) Stage all data off
(2) Delete the disk backup unit in NetWorker
(3) Clean out the DBU filesystem
(4) Recreate the disk backup unit in NetWorker

This resolved corruption in the DBU resource/mm entry that would still allow the savesets to be created "on the wrong side" even with the nsrmmd/nsrd fixes in place.

I'm afraid I was never given patch IDs etc for this, just the patches.
--------------

Cheers,

Preston.

--
Preston de Guise


"Enterprise Systems Backup and Recovery: A Corporate Insurance Policy", due out August 15 2008:

http://www.crcpress.com/shopping_cart/products/product_detail.asp?sku=AU6396&isbn=9781420076394&parent_id=&pc=

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER