Networker

Re: [Networker] How to find a piece of a save set that spans?

2010-10-06 22:04:35
Subject: Re: [Networker] How to find a piece of a save set that spans?
From: George Sinclair <George.Sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 6 Oct 2010 22:02:24 -0400
George Sinclair wrote:
Werth, Dave wrote:
Yes, I was wondering what the 1000.0 meant as well. Seems like useless information since it's the same on all of them.

mysterious, certainly.

As far as determining what is spanning volumes below the save set level, I'm not sure that NetWorker even has that information to give you. The index probably just stores what save set the file is in but not what volume. The media DB stores what volume(s) a save set is on but doesn't break it down to file/directory level. That is just my guess but it makes sense to me.

It must have it somewhere, or some values that it can use to determine that, because how can it tell you that a directory that you've selected to recover will require, say, volumes ABC123 and ABC124, when you select the 'Show required volumes' under nwrecover or type 'volumes' under the CLI tool?

It's always been my understanding that the index only gets read during recover whereas the media database is checked during backups to determine the date of the last database entry for the save set so it can then decide what has changed since that date when it does the next incremental or numeric backup for the save set. Clearly, as you noted, the media database does not go further in granularity than the save set name itself, but since the index gets read when doing recover, it must store that information in the index somewhere or at least some type of metadata that it can then use to determine which volumes it's on. Also, how does it know what the next tape is to load when doing a recover that spans? There's no information on the tape that's going to tell it that since it couldn't have known in advance when it was writing the EOF mark on the preceding tape during the backup. It must be in the index.

I'm looking for a way to ferret that out somehow, minimally for a given directory. Hmm ...

I've looked into this a little further. I first tried the earlier method that I theorized on using 'nsrinfo' with the '-v' option and adding up the 'NSR size' values until I got something close to the sumsize reported from mminfo, and that allowed me to find a directory that spanned tapes 1 and 2 and also another one that spanned tapes 2 and 3 but not after that. The directory that I thought would force it to use tapes 3 and 4 instead wants to use tape 4 only. I kept backing it off a directory, using previous ones listed in the 'nsrinfo' output, but each time that I thought I had a winner, it kept requesting tape 4. Maybe I just got lucky on the others.

Next, I looked at the '-V' option for nsrinfo that shows the offset. By using that value and comparing it with the 'first', 'last' values from mminfo, I think I might be able to make it work.

So, I want to find a directory that spans tapes 3 and 4 (with 4 being the last volume):

mminfo -s server -q 'volume=tape3,name=saveset_name -ot -r first,last | tail

nsrinfo -s server -V -t nsavetime client > filename

I then look at all the entries in filename until I find the first one where the 'off=value' value is as close to the last value reported from the mminfo output. In my case, any entries that were even close were all under the same directory. I then ran recover and selected that directory and sure enough, it shows it being on volumes 3 and 4.

When I go down to the file level, if I pick the file shown in nsrinfo whose 'off=value' is just after that, it requests volume 4 only. If I pick the one before that, it requests just volume 3.

I'll have to play around with this a little more, but I'm thinking the solution involves using the nsrinfo command to report what was actually backed up, and what the offset value is for each such entry, and then cross checking this against what mminfo shows for the 'first' and 'last' values for the save set for the given volume. Or something like that. Further testing should demonstrate if this logic will work.

*If* it works, I could probably script it all, and have it just spit out the most likely candidate that spans each set of volumes.

I'm not sure if more than one file in the given save set could span two tapes, but I'm thinking not. There's going to be one and only one file in that save set that spans two tapes. Other save sets that were multiplexed during the backups could also have a save set that spans those same two tapes, but not more than one. That sound right? Maybe I have it all wrong?

George


George


Dave Werth
Garmin AT, Inc.
Salem, Oregon

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of George Sinclair
Sent: Wednesday, October 06, 2010 4:54 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] How to find a piece of a save set that spans?

Werth, Dave wrote:
George,

I was playing around a bit with this in the NMC GUI and found if you go to "Media" and select "Save Sets" you can set a query for the save sets you are interested in. Then in the "Save Set List" tab it gives you a list of all of the volumes that save set is on.

For instance a save set from my weekend full backup displays in the volume name column:

150021(1000.0,h),150022(1000.0,m),150023(1000.0,m),150024(1000.0,t)

That's the sort of information you're looking for, right?

Well, that tells me which volumes the save set is split across and the size of each such piece (on each volume). That's the same as running an 'mminfo' command and reporting the various fields, e.g. sumflags,sumsize,totalsize,volume ...

What I actually need is a way to find a file or directory or something that's contained within the save set itself but is split across tapes 1 and 2. Clearly, once the first tape (h) is full, it then moves onto the next tape (m), but what file, from that save set, is actually split between those two tapes? I'd like to be able to determine that also for volumes 2 and 3 and 3 and 4. In my case, I don't really need to go down to the actual file name itself, just the top level parent directory that's directly beneath the save set.

For example, let's say the save set name is '/export/dir1/data', and the save set spans four tapes. Let's assume that 'data' contains hundreds of sub-directories named: 0001, 0002, 0003, 0004 ... Which one of those directories spans both tapes 1 and 2? Which one spans both tapes 2 and 3, tapes 3 and 4? That's what I need to find out. In my case, none of these directories is large enough to span more that two tapes.

BTW: What does the '1000.0' value refer to? I see that on my end, too. In fact, it looks to always be that same value. The volume names and the 'h', 'm' and 't' values make perfect sense and concur with what 'volume,sumflags' shows from 'mminfo', but the '1000.0' has me confused.

George

Dave Werth
Garmin AT, Inc.
Salem, Oregon

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of George Sinclair
Sent: Wednesday, October 06, 2010 4:01 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] How to find a piece of a save set that spans?

Werth, Dave wrote:
Yes, I like slick too but sometimes when it's something you're only doing once or twice for testing purposes and don't need it on an ongoing basis then brute force methods are adequate.
I agree, but in this case, there's so many top level directories for some of these save sets that it could take me an unacceptably long time.

Would the following scenario possibly work???:

Obviously, I can run 'mminfo' and have it tell me the size (sumsize) of each piece of the save set that's on each volume. Let's say there's four pieces, and I specify something like 'sumsize(20)' to have it list it out in actual bytes. So what if I then run 'nsrinfo -s server -t nsavetime -v client' for the given save set and capture that to an output file. That will list all the pieces/parts in the save set, file by file, with the 'NSR size' and 'file size' of each. Next, I write a script to parse that output file and add up the sizes until it hits something close to the 'sumsize(20)' for the first volume. Once it hits that, it then prints out the pathname of that file. I then manually check to see if that directory spans. If not, it's probably one of the directories just before or after that? I could then do this for the second piece and third piece. In this example, the fourth piece would be the last so that would be moot.

Assuming this harebrained scheme would even work, I'm not sure what the difference between 'NSR size' and 'file size' is, but 'NSR size' is always a little bigger. Maybe I would want to use 'NSR size' for this? Also, is the order that 'nsrinfo' lists everything in the same as the order that the data was actually backed up? If not, this goofy method won't work.

Maybe there's a better way (sigh ...).

George

Dave Werth
Garmin AT, Inc.
Salem, Oregon

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of George Sinclair
Sent: Wednesday, October 06, 2010 3:31 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: Re: [Networker] How to find a piece of a save set that spans?

Werth, Dave wrote:
George,

I don't know about how you can determine ahead of time what directories will span multiple volumes but you can certainly go into Recover and select a directory to recover then check the "Show required volumes" display to see if in fact it does span volumes. I don't imagine it would take too long to find one that did (but then what do I know?).
Yes, that's exactly what I'm looking for: a way to determine which directories did in fact span tapes, NOT which ones will span tapes. So, this is an "after the fact" question. I can certainly do as you mentioned but was looking for a slick way to determine this without trial and error?

Some of the save sets have a small number of top level sub-directories so it won't take too long to find one that spans, but most of the save sets have a lot of top level sub-directories, so that will take much longer. Obviously, at least one of them must span two tapes.

George

Dave Werth
Garmin AT, Inc.
Salem, Oregon
-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On Behalf Of George Sinclair
Sent: Wednesday, October 06, 2010 3:08 PM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] How to find a piece of a save set that spans?

Hi,

We have 19 save sets that we've just backed up. These have also been
cloned. Each of these save sets is 1+ TB and consist of a number of
smaller sized sub-directories, e.g. 300 KB, 2.4 GB, 19 GB, etc. The tape
pool is indexed. All of these save sets span multiple tapes (minimally
3-4 tapes) as they were multiplexed together (parallelism=4) during backup.

I'd like to run a browseable recover test (nwrecover or CLI recover) on
   a couple of random sub-directories from each save set, but I'd like
to also pick a few that span at least two tapes. I don't want to recover
the whole save set, however, as these are all very large.

Is there a way I can determine which directories span two tapes?

Will I have to just select random directories, using nwrecover or CLI
recover, until I find one that shows two volumes required?

Thanks.

George

--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or
unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type "signoff networker" in the body of the email. Please write to networker-request AT listserv.temple DOT edu if you have any problems with this list. You can access the archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


This e-mail and any attachments may contain confidential material for the sole use of the intended recipient. If you are not the intended recipient, please be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.

Thank you for your cooperation.









--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER