Networker

Re: [Networker] Simple UNIX Clone script - for use by community (UPDATED)

2007-04-06 16:12:43
Subject: Re: [Networker] Simple UNIX Clone script - for use by community (UPDATED)
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 6 Apr 2007 16:10:18 -0400
Reed, please see my responses interspersed below.

Reed, Ted G [IT] wrote:
        Good point in using ssflags as a saveset health check.  How would you code to vet 
the ssflags as kosher before the nsrclone portion?  The '!incomplete' just means not 
incomplete savesets.....so currently running jobs (in-progress) and aborted jobs are not 
duplicated, only finished savesets.  But we don't make any checks on ssflags other than 
the inferred "it can't have an 'i' or 'I' in ssflag" inherent in the 
!incomplete.  So it would clone regardless of pre-set suspect status (as you describe in 
your email), but I would want it to have that behaviour.  If it fails at cloning, I'd 
rather have the failure and see it so I know to check the original saveset/media for 
errors.
You might also be able to employ the '-v' flag as described in mminfo man page wherein it says "... The second flag indicates the status of the save set. A b indicates that the save set is in the online index and is browsable via the recover(1m) command. An r indicates that the save set is not in the online index and is recoverable via the scanner(1m) command. An E indicates that the save set has been marked eligible for recycling and may be over-written at any time. An a indicates that the save was aborted before completion. Aborted save sets are removedfrom the online file index by nsrck(1m). An i indicates that the save is still in progress.".

Not saying you have to do this, but you might want to consider it. Isn't it possible that it could have been marked (a)borted somehow? Also, if the save set is somehow aborted or not completed successfully, I think -v will also typically show 'E' for recycle, which in this case is too new to have that value via the normal means. In other words, something went wrong. I would think that you could simply check the ssflags and the -v flags to ensure that they both show a healthy save set. So, for example, -v should not show something like 'ca' and should instead show something like 'cr' or 'cb' or 'hb' or 'mb' or whatever, just as long as the second flag is not 'a' or 'E' or 'i'. Likewise, ssflags should be vrF or vF. I think you'd need to run these separately, though. If it's anything else then maybe have it email you or let you know that the affected save sets were not cloned due to something suspicious in the original save set. However, as you said, you're using the cloning procedure to determine this anyway so maybe it's moot. I guess my point is that I would be hesitant to attempt a clone on something that looked to be a flagrant problem because I don't want to waste space on a clone tape, only to have it bomb part way through, particularly on a large save set because I can't get my space back, but I might try to first clone it off to some other "test clone" pool instead before using the real one to see what it does. Just an idea, but maybe using '-v' and/or also looking at the ssflags could make your script even more full proof. Then again, maybe just extra work [sigh].

        We have a total of ~120 otherwise-valid-but-listed-as-suspect savesets 
(vF, vrEF, etc), of which 26 are on Clone media, the other 95 or so are in the 
BR Pool.....and many are spanned volumes so a single suspect saveset may mark 
suspect as many as a dozen volume references.  So that implies that during the 
read back of saveset for clone creation, there was a read-error on the original 
tape.  But that'd usually never come to light if we didn't run the clone.  And 
even so, the copy is not necessarily suspect.  For example, the below is a case 
where the original is marked suspect (due to some read error on one of the 
tapes) but the clone is not....and I'd believe it's fully recoverable, if not 
both original and clone recoverable.
         volume        client     pool           ssid       ssflags clflg
        M20760         pdahlb01z  BR Pool        1023713736 vF     s
        M20824         pdahlb01z  BR Pool        1023713736 vF     s
M21834 pdahlb01z Clone Vault TX 1023713736 vF M21874 pdahlb01z Clone Vault TX 1023713736 vF M21935 pdahlb01z BR Pool 1023713736 vF s

Also, are you sure a failed clone will result in a copy count increase? That should only be true if it was successful, else there isn't multiple copies.
Well, you may be right, but I ran a command like this:

mminfo -s orion -q 'copies=2' -ar client,name,type,savetime,volume,copies,ssid,ssflags,clflags | egrep 's$'

I get several lines of output. All the affected save sets are from back in 2003 and 2004 wherein the clflags value is 's' and the 'ssflags' value is 'vrEF', but in each case, only the save set from the original volume is shown, not the clone save set. However, each of these does have a clone save set. The clone save sets have the same values for the ssflags but nothing for clflags. Is it the case the clflags value is only for the original save set and does not apply to clones?

George

 Thanks much for the feedback.  I've seen your threads and commentary for a 
long time on the list and very much welcome your input.  Thanks again.
--Ted


-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On 
Behalf Of George Sinclair
Sent: Friday, April 06, 2007 10:57 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] Simple UNIX Clone script - for use by community 
(UPDATED)

Shouldn't you also be checking the ssflags and clflags values for the original 
save sets before cloning them? I've seen cases where an original save set was 
marked 'suspect' even though it had never been read during a clone operation. 
I've seen cases, too, where a save set completed (no abort) but whose ssflags 
suggested a something was awry. I would want to ensure that the ssflags were 
golden before I started a clone on it. Is this what the '!incomplete' flag is 
accomplishing? Also, what if the save set was previously cloned but not 
successfully?
It's value for 'copies' would be >= 2 but would still need to be cloned again, obviously to a different volume unless 'nsrmm -d -S ssid/cloneid' was run.

George

Reed, Ted G [IT] wrote:
Is it Monday again. Arghhhh.....here's the final, really right version. mminfo -omo -r ssid -q "!incomplete,pool=BR Pool,copies=1,level=full,location='$JB'" -t "$TIME" > $LOG_DIR.$JB

-----Original Message-----
From: Reed, Ted G [IT]
Sent: Friday, April 06, 2007 10:26 AM
To: 'EMC NetWorker discussion'; Reed, Ted G [IT]
Subject: RE: [Networker] Simple UNIX Clone script - for use by community (UPDATED)

Sorry folks, I cut/pasted an old, nonworking version of the mminfo (bad 
quotation management for variable interpretation).  Here's the RIGHT version.  
So the script below, plus this fix, equals working UNIX script again.

mminfo -omo -r ssid -q "!incomplete,pool=BR Pool,copies=1,level=full,location='$SN'" -t "$TIME" > $LOG_DIR.$SN



-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of Reed, Ted G [IT]
Sent: Thursday, April 05, 2007 11:29 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] Simple UNIX Clone script - for use by community (UPDATED)

NOTE:  Outlook likes to remove "extra" line breaks sometimes.....which is really bad for scripting. 
 Be sure you check to see if it says "Extra line breaks in this message were removed", and if it 
does, click on the message and "Restore line breaks".  Otherwise, some of the code gets mangled.  
Thanks.
--Ted

I'd like to thank Conrad Macina for providing an elegant piece of code to replace the ugly 'echo "JUKEBOXNAME" >> FILE' hard code repeats we had in place to list all defined jukeboxes. Why edit when you can look it up and maintain dynamic functionality? PS. Reminder that the 'pool=BR Pool' in the mminfo call and the '-b "Clone Vault TX"' in the nsrclone are both specific to my environment and should be edited to meet your environment's needs. See previous thread entries for greater detail.
----------BEGIN CODE----------


#!/bin/ksh
#
# Name: clone_script.sh

#*******************
# Script Variables *
#*******************
LOG_DIR="./ssid"
JB_NAMES="./JB_NAMES"
TIME="one week ago"

#*********************************************************************
************************* # Change directory used because ps -ef output was too long and was breaking the if statement. * # By using the change directory we shorten the ps -ef output to manageable length * #*********************************************************************
*************************
cd /usr/local/Legato/clone

#**************************************************
# This writes the jukebox names into a text file. *
#**************************************************
echo "show name \n print type:nsr jukebox" | nsradmin -i - | grep "name:" | awk '{print $NF}' | tr -d '";' | sort > $JB_NAMES

#*********************************************************************
*************************************
# Verifies the last nsrhost has finished. Want clone to include most recent master backup and bootstrap. *
#*********************************************************************
*************************************
while ps -ef | grep nsrhost | grep -v grep do
  sleep 300
done

#*********************************************************************
*****************************************
# Section to check for running clones, if not generate list of save sets to be cloned and executes the clone. *
#*********************************************************************
*****************************************
for JB in $(cat $JB_NAMES)
do
  if ! ps -ef | grep nsrclone | grep $JB | grep -v grep > /dev/null
  then
        mminfo -omo -r ssid -q '!incomplete,pool=BR 
Pool,copies=1,level=full,location=$JB' -t $TIME > $LOG_DIR.$JB
        nsrclone -b "Clone Vault TX" -S -f $LOG_DIR.$JB &
  fi
Done

----------END CODE----------

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER



--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -
To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request AT 
listserv.temple DOT edu if you have any problems with this list. You can access the archives 
at http://listserv.temple.edu/archives/networker.html or via RSS at 
http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER



--
George Sinclair - NOAA/NESDIS/National Oceanographic Data Center
SSMC3 4th Floor Rm 4145       | Voice: (301) 713-3284 x210
1315 East West Highway        | Fax:   (301) 713-3301
Silver Spring, MD 20910-3282  | Web Site:  http://www.nodc.noaa.gov/
- Any opinions expressed in this message are NOT those of the US Govt. -
To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER