Networker

Re: [Networker] Are you sure your savepnpc backups are good?

2004-04-16 11:30:28
Subject: Re: [Networker] Are you sure your savepnpc backups are good?
From: "David E. Nelson" <david.nelson AT NI DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 16 Apr 2004 10:30:02 -0500
Hi Davina,

Snips and comments below...

On Fri, 16 Apr 2004, Davina Treiber wrote:

> David E. Nelson wrote:
> >
> > Notice that even though a savepnpc time out condition occurred, the data was
> > still being backed up - it didn't finish until 1:13 hours later.  For oracle
> > backups, this is now trash since the DB has either been started or come out 
> > of
> > hot backup mode.  Are you aware that this occurred?  I'd be willing to bet 
> > that
> > not likely.
>
> This part is not a bug, it's the way it was designed. When the timeout
> occurs, the post-processing runs. It's left completely to the user what
> to do at this point. It's not desperately difficult to script something
> at this point that stops the overrunning sessions (although there are
> some gotchas), but that's your decision since not everyone would want to
> do this, some users might want the backup to continue but be notified
> about the timeout.

Well, I don't think it's a bug .... IMHO, just a poor way of doing things -
especially not reporting what has occured.  My thoughts are that since NW is an
_enterprise_ product, it should act as one and take measures so that surprises
are encountered later on.

> I agree that doing nothing can cause problems, an Oracle backup will be
> useless if the database comes back up part way through. On NT it can be
> worse since the DB may fail to start if the files are open because of
> the backup.

Hmmm, interesting.

> > I was quite surprised as to the number of 'Time out' entries existed in our
> > savepnpc.log's.  I'd suggest you do the same if you're using savepnpc w/
> > timeout.
> >
> > - No environment variable flag is passed into savepnpc scripts no matter if 
> > a
> > timeout occurred or not.  My research and testing has shown that the env for
> > post-savepnpc is identical for successful and timed out backups.
> >
> > - Yes, you can script a 'grep' to look in /nsr/log/savepnpc.log,
>
> I wouldn't count on that. The log entries often seem to be cached in
> memory and not written until the post-processing is complete.

Yes, I saw this behaviour now that you mention it.  The info in the file is
sparse to begin with, ie. no mention of which groups, for example.  I've
noticed that several things take time - one of them for drive status to go from
'writing, idle' to 'writing,done' when the backups are done.  It appears that
not until that time does 'sscomp' (from mminfo) get's updated.  I've had to add
a sleep for this process to happen so that my verification reports proper
fields.

> > you can
> > construct a query for 'mminfo' and report 'sscomp(17)' and look for
> > 'undefined', check the status of a saveset via mminfo for 'in-progress', 
> > etc,
> > etc, etc.  The problem with all these approaches is that things may have
> > changed before you performed the query.
>
> I think these approaches are likely to be quite unreliable. The best
> plan is for the post-processing script to check for savepnpc processes
> still running with the relevant "-g" parameter for your group, or to
> query the server (nsradmin) for outstanding save sets for this client in
> the work list, this is exactly how pstclntsave does it.

Agree, the methods above are not reliable.  It is interesting to kill any
backups that are using savepnpc....might have to give that one some serious
consideration.

Regards,
        /\/elson


--
~~ ** ~~  If you didn't learn anything when you broke it the 1st ~~ ** ~~
                        time, then break it again.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=