Networker

Re: [Networker] Limit on number of savesets in an nsrclone?

2008-03-11 00:08:31
Subject: Re: [Networker] Limit on number of savesets in an nsrclone?
From: Wayne Smith <Wayne.Smith AT CDU.EDU DOT AU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 11 Mar 2008 13:30:48 +0930
I am also experiencing this, and have a manual process I run through
each day (a collection of basic scripts. 
I now check this each morning, as until the problem is fixed,
staging/cloning will not work on this device.

I have been advised by EMC that it has been fixed in 7.3.3, but I am not
ready to upgrade from 7.3.2 yet, hence the manual workaround every
couple of days. 

If wanted, I am happy to post up my workaround method. Its not pretty,
but it does work.

Regards
Wayne Smith
Systems Administrator
Data Centre Group
ITMS
Charles Darwin University,  Darwin,  NT,  0909

CRICOS Provider No: 00300K
-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Preston de Guise
Sent: Tuesday, 11 March 2008 5:21 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] Limit on number of savesets in an nsrclone?

Hi Ian,

>> A look at /var/nsr/logs/daemon.log would have been a good idea.  I 
>> can now see what's happening, the question is why it's happening...
>
> I tracked it down.  Somehow (and there was some disturbance in the 
> storage a week ago, so it's not impossible) one saveset out of 400
> in the volume was missing.   This in turn caused the whole clone or  
> stage operation to be aborted with extreme prejudice, without doing 
> any clones.  I found it by staging the savesets one at a time, until 
> one of them failed, then (after checking it was a saveset I could live

> without, but as it was missing anyway that was rather moot) using 
> nsrmm -d to delete the one that failed.  Then the whole thing would 
> stage successfully.
>
> Interestingly, scanner -i ran successfully: I presume it doesn't purge

> from the media database savesets which are in the database as being on

> the volume but which scanner doesn't find.  Another route to achieve 
> the same result would be to delete all savesets that are on the 
> volume, then scan the volume in.

Check to see whether the saveset reported as missing actually can be
found under a /_AF_readonly component of the volume. If so, then you
have a known bug whereby nsrd/nsrmm can accidentally generate data to
the "wrong side" of the device. Because it shouldn't go there, then
NetWorker can't find it when the saveset access is attempted.\

>> 03/10/08 13:38:20 nsrd: media info: nsrmmd #3 on backup- 
>> srv.ftel.co.uk started a s requested
>> 03/10/08 13:38:41 nsrmmd #20: filesys_retrieve: failed to read:  
>> cannot open /var
>> /remotestage/incrementals/offsite-4/85/38/8f2df281-00000006-
>> ddc77fdd-47c77fdd-01
>> 2e0000-ac100301 file: No such file or directory

If the data is being written to the wrong side of the DBU, then there is
a fix you can get from EMC for nsrmm and nsrd - I've had to apply it
several times to Linux customers, but never yet before for Solaris
customers.

A previous posting I made covers what I could tell about the bug, but
I've repeated it below:

----------
In order to resolve this, EMC support had to provide a revised nsrmmd
and nsrd. Further, each disk backup unit had to have the following
actions performed once the revised nsrmmd and nsrd were supplied:

(1) Stage all data off
(2) Delete the disk backup unit in NetWorker
(3) Clean out the DBU filesystem
(4) Recreate the disk backup unit in NetWorker

This resolved corruption in the DBU resource/mm entry that would still
allow the savesets to be created "on the wrong side" even with the
nsrmmd/nsrd fixes in place.

I'm afraid I was never given patch IDs etc for this, just the patches.
--------------

Cheers,

Preston.

--
Preston de Guise


"Enterprise Systems Backup and Recovery: A Corporate Insurance Policy",
due out August 15 2008:

http://www.crcpress.com/shopping_cart/products/product_detail.asp?sku=AU
6396&isbn=9781420076394&parent_id=&pc=

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type "signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or via RSS at
http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER