Re: [Networker] How best to determine time value for this scenario?

Allan Nelson wrote:

Hi George
I may be totally missing the point here, but wouldn't it be much simpler to add 
a field to your non-Legato database that indicated whether that had been saved 
or not?
ie you're already extracting a list of 'stuff to do' from the database and then 
presumably issuing a save command?
After the save command completes, can you not then update the database flag to 
say 'done'?
I'd certainly try to take that sort of tack and then you've no problems with 
dates/times, was it successful etc etc.
Of course if you haven't got write access to that local database, then forget 
what I just said ;-)

Excellent point! I'll suggest we look into that as that would be themost tenable method, and thanks for taking time to read my lengthy post. :).

I should note that my plan is to continue to run normal (indexed filesystem level) backups by device in parallel during this test phase.These devices only contain this special data, nothing else, and thosebackups write and clone to a separate pool. Running that in parallelwill allow me to corroborate that the test script - it will back up andclone the data to a test pool - is working correctly. So, I would runweekly recovers from both backups and compare. In most cases, entirelynew path names (directories together with their constituent files) arearchived, and their directory pathnames added to the database, and onlyrarely are existing ones modified (e.g. new files added or old onesremoved, etc.), so the script should be backing up the same data, ormaybe an occasional superset of that data as explained below. I thinkwhat I might do for this script in the interim (for the test phase)follows below:

1. Before running the script for the first time, manually touch a file(sentinel).

2. When the script runs, it first touches a new file (sentinel2), checksthe mod time on sentinel (original), subtracts 5 minutes and stores thisvalue.

3. The script then queries the non-Legato database for all pathsarchived at or after that time and backs these up, even if they werealready previously backed up in the past. This is because a given pathcould be modified and re-archived later. We don't care what files underthere changed, however. We simply run a full on the whole path - always.The 5 minute delta helps to ensure against any possible time sync.problems. Again, could end up with some overlap from time to time, butno biggie.

4. The very last thing the script does before exiting is to movesentine2 to sentinel - an atomic operation.

If the script fails before it reaches that point, the original sentinelfile's mod time will be unaffected, and the next run will start with thesame time, so worst case, you re-back up some of the same data again,but better to have overlap than underlap, and I don't expect this tohappen too often. Ditto if the script fails on the move command itself.But if everything goes correctly, the sentinel2 now becomes the timethat will be used on the next run.

Now, if anything should get archived before the script runs, it will getidentified as needing a backup since its archive time will be at orlater than the sentinel time. If, however, it gets archived afterinvocation of the script (maybe even while it's still running) then itsarchive time would be at or after the time of sentinel2, but again,there's the 5 minute delta to seal in the cracks.

In addition, the script will also query the Legato database for all savesets in this pool that have been backed up in the last month. It thenwalks through, building a hash of any save sets whose ssflags, clflagsor sumflags indicate a problem, with each hash key being replaced by themost recent version of that same save set that also had a problem. Inthe end, we have a unique list of any bad save sets. The script thenremoves any from the list that are already slated to be backed up (i.e.their archive time reported from the non-Legato database is later thansentinel) or that have had a more recent, successful backup, and therest are added to the backup list, along with any paths that have beenarchived since the sentinel was last modified. So, we have our cake, andwe can eat it, too.

Cloning is then automatically enabled for the group. The script islisted under the backup command field in the NSR client resource (therewill be two such clients, both in this group), and the savegrp commandis run from the server's cron from a script which then checks thatcloning succeeded and e-mails a report status for the group, regardlessof what happened. Should be an e-mail every day. Yes, NW will send groupcompletion notification, but there are some other things I'd like toreport. Could simply list a script for the save set completionnotification, but this is more specialized, so I might add it a separatee-mail report unto itself, and not bother trying to fit it into thegeneral one that all groups get filtered through.

I should also note that there will be at least two clients that will usethe above script, and that's the nice thing about using the backupcommand field is that it allows me to not have to worry about trying tomanually clone from within the script where I would have to be concernedthat one client is trying to read from a tape that the other might stillbe writing to, or vice versa, and playing NFS file locking games. I canlet Legato handle the cloning.


Whatcha think?

George

Hope this helps... Allan.
George.Sinclair AT NOAA DOT GOV 16/03/09 22:50 >>>
This may be outside the purview of this news listing, but thought I'drun this past the gang just to get some advice. Sorry to make this solong, but though it necessary to provide the details.
We have a special archive data set wherein we want to use NW to performthe backups on certain directories but not to determine what needs to bebacked up among that data. We want each directory to be its own save setinstead of having it all get backed up under the parent device name,e.g. /0/data, /1/data, etc. There's a non-Legato database that I canquery that will report the directories/pathnames that have been archivedsince a specific date/time, but I need to pass it a date/time value. Insome cases, the same directory might be re-archived again later in whichcase its date will be updated in the database.
We want to use NetWorker to perform the backups on these paths at levelfulls with no indexing turned on for the pool. This is because 1. we donot want to use the file system to determine what needs to be backed up,2. We sometimes move this data around between systems and don't want tohave to rerun level fulls like we would if we were performing indexedbackups by device., 3. This is much faster as it doesn't have to searchthrough the file system to determine what has changed, and 4. We onlywant to back up the directory if it has been added to that database, notjust because it's there on disk. Also, this makes it clearer as to whatexactly got backed up since each will be its own save set. We can't hardcode the save set list, though, because it changes every day. We do notmind using saveset recover to recover these.
The plan is to specify a script for the Backup command field in theclient resource that will determine the appropriate time value, obtainthe necessary path names from the non-Legato database and then run asave command on each at a level full with the -N option for the symbolicsave set name. Cloning would be enabled for the group.
But how to determine the time value????

To do this, I was thinking that I could do this:
1. Run an mminfo query on the pool and sort by time and then use thelast time value and pass that to the non-Legato database (maybe subtract5 minutes just to be safe as I don't mind an occasional overlap), but itoccurs to me that a major problem with that is that it is possible thatone or more paths may have been added to that database after the starttime when the group *last* ran but before it completed. As a result, Iwould miss those and end up only grabbing the ones that were updatedafter that last save set completed from the previous backup.
2. While the start time for the backups probably will not change, itcould, and it could always be the case that the backups didn't actuallyrun on that date/time due to skip, other problems, etc. so using thelast group start time value might be too recent.
3. The backup script could first touch a file, and each subsequent runuses the time value of that file before touching it again, but what ifthe script runs several days in a row and fails, and then the next daysucceeds, but now it's specifying the time value of when it last ran,but prior to that at least one path was archived. I would then miss that.
So I guess I need to be able to pass it the time that the group actuallylast started and really did something. But, let's say the last time thegroup ran, there wasn't anything to backup, or maybe it skipped. Maybethe backups were shut off for a couple of days, too. It will stillreport a last start time value, but that might be too recent. I'mthinking that I need to determine the date and time that the backupslast ran and actually did back up something and then use that time instead?
If I walk back through the savetimes in reverse, looking at all of themthat are from the same day, I could pick the first such one, maybesubtract 5 minutes to be safe, and make that my time value, but how do Iknow that the group didn't actually start the night before and continueover to that day? If it did, and something was archived just after itstarted, I would miss that one, too.
Any one have any ideas about how best to determine what time value touse in this case?
George



--
George Sinclair
Voice: (301) 713-3284 x210

- The preceding message is personal and does not reflect any official orunofficial position of the United States Department of Commerce -

- Any opinions expressed in this message are NOT those of the US Govt. -

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER