Networker

[Networker] nsrim mega-questions

2012-08-30 18:59:18
Subject: [Networker] nsrim mega-questions
From: George Sinclair <george.sinclair AT NOAA DOT GOV>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 30 Aug 2012 18:58:42 -0400
Hi,

I'm trying to confirm something here in regards to nsrim, and I have a bunch of questions (yeah, what's new):

The nsrim man page mentions:

nsrim uses policies to determine how to manage online entries. (See nsr_policy(5), nsr_client(5), and the NetWorker Administrator's Guide for an explanation of index policies). Entries that have been in an online file index longer than the period specified by the respective client's browse policy are removed. Save sets that have existed longer than the period specified by a client's retention policy are marked as recyclable in the media index.

Now, assuming that nobody is running 'nsrck' or 'nsrmm' then is the 'nsrim' command the only NetWorker process that's actually responsible for removing entries from the CFI (i.e. entries that are beyond their browse policy and have no dependencies)? Also, I assume it's likewise the boss for changing save sets in the media database from browsable to recoverable or recoverable to recyclable, right?

Assuming this is the case, I'm a little confused about what actually happens. The man page for nsrim mentions that it gets run by savegrp after savegrp completes, but takes no action unless /nsr/mm/nsrim.prv >= 23 hours old, so it should only get run once a day. When it runs, I see something like this as far as the processes that appear related:

4170  1  /usr/sbin/nsrexecd
1009 4170 /usr/sbin/nsrim -MXq
1103 1009 /usr/sbin/nsrck -MXv -q client1 client2 ... blah-blah-blah ... client149 client150

Based on the PID and PPID values, it looks like nsrim launches the nsrck command with the 'X' option, which the nsrck man page shows equivalent to '-L 3':

Level 3 does a level 2 check and reconciles the online file index with the online media index. Records that have no corresponding media save sets are discarded. Also all empty
subdirectories under db6 directory are deleted.


Questions:
1. This 'nsrck -MXv' check (wherein 'X' = '-L 3') doesn't seem to have anything to do with actually removing entries that are no longer browsable. And nsrck certainly isn't going to do anything with the media database, so is this nsrck check just a 'nice' extra that nsrim spawns because it loves us and want us to be happy?

2. I assume that if you run 'nsrim -X' manually that this would likewise run that same nsrck check, too, or does this only happen when nsrim is being run in master mode?

3. Why does the 'nsrck' check run on every darn client that's ever existed? I see it run through old defunct clients that no longer exist; there are no NSR client resources and/or directories under /nsr /index for most of them. The whole nsrim/nsrck check only take a few minutes to complete, and never has any issues, but just curious. Maybe it looks in the media database since those clients still have old entries?

4. Why does the man page mention that nsrim is not normally run manually? But it then goes onto say:

If save sets need to be monitored for their browse and retention policy more frequently (for example, if savegrp(8) is run more frequently than every 23 hours), nsrim -X should be set up as a cron(1m) entry, or should be run manually.

5. I don't need them to be monitored more than once a day, but we do have a number of groups that run throughout the night, so savegrp is certainly run more than once a day. Should I set up the 'nsrim -X' command to run out of cron?

6. What happens if 'nsrim -X' is running when groups are backing up?

This seems hard to avoid when the nsrim.prv file reaches the 23 hour mark, no groups run for several hours after that time but then multiple groups start at the same time. Only one is going to invoke it, but the others are still running or may over lap before the nsrim command completes.

7. The man page for nsrim mentions:

/nsr/tmp/.nsrim
nsrim locks this file to prevent more than one copy of itself from thrashing the media database.

"Thrashing" is not *necessarily* "trashing", so what's the worst case here?

8. If I'm going to run 'nsrim -X' out of cron, or via some cron script, then is there any need to worry about locking the /nsr/tmp/.nsrim file, just in case, and then remove the lock once done??? Will flock work okay?

I don't know how nsrim is locking this file, but it's mtime and ctime never changes (it has an old timestamp) like it would with flock so I'm unclear how or if nsrim would even know that I had a lock on it. Alternatively, suppose I first manually set the time on the nsrim.prv file to be sometime the next day (touch -d 'YYYY-MM-DD HH:MM:SS' nsrim.prv) at a quiet time when no backups are usually scheduled and then set the cron job to start maybe two hours before that time. I could first check the timestamp on this file and only run 'nsrim -X' if it's, say, >= 22 hours but < 22.5 to be safe, so as to avoid any possible collision with NetWorker and/or some unexpected group that might launch nsrim.

Then again, maybe its completely moot if nsrim really does lock itself, so I would be fine to run it any time without worries that that the first group that runs (after nsrim.prv >= 23 hours old) is somehow gonna then launch nsrim in parallel/overlap with my job or some kind of race condition?

Thanks.

George

--
George Sinclair
Voice: (301) 713-3284 x210
- The preceding message is personal and does not reflect any official or 
unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -

<Prev in Thread] Current Thread [Next in Thread>
  • [Networker] nsrim mega-questions, George Sinclair <=