Networker

Re: [Networker] Are you sure your savepnpc backups are good?

2004-04-16 16:41:07
Subject: Re: [Networker] Are you sure your savepnpc backups are good?
From: "David E. Nelson" <david.nelson AT NI DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 16 Apr 2004 15:39:02 -0500
Hi Davina,

On Fri, 16 Apr 2004, Davina Treiber wrote:

> David E. Nelson wrote:
>
> > Well, I don't think it's a bug .... IMHO, just a poor way of doing things -
> > especially not reporting what has occured.  My thoughts are that since NW 
> > is an
> > _enterprise_ product, it should act as one and take measures so that 
> > surprises
> > are encountered later on.
>
> Agreed. savepnpc in general is a poor way of doing things.
>
> I've worked with savepnpc for a long time, I implemented it for the
> customer it was designed for. Even then it needed a lot of work to make
> it do what it was supposed to do. It was even more buggy then than it is
> now, and we had several workarounds in place.

So you're the one that we can blame this all on! ;) ;) Seriously, you have some
good stuff below that shows you've given it some serious thought.  I've added
some comments.

> Given a requirement to implement a pre and post processing function for
> a backup, why would you choose to give the client the whole task of
> handling this? I don't think I would. Would you design it so that you
> had to edit files on the client to configure it? Neither would I. The
> server knows where the backup is up to and what is completed, why should
> the client have to continuously query the server to obtain this
> information? The whole design is flaky to say the least.
>
> Here is my idea of how it SHOULD have been designed:
>
> * There would be two new attributes in the client resource on the
> server, called precmd and pstcmd. The values in these would take the
> same format as the backup command attribute, and would have the same
> restrictions on naming and file location so that the commands specified
> could be run using the nsrexec mechanism and with no NetWorker changes
> necessary on the client. These fields would be incorporated into the GUI.

Ok, I know about the must begin with 'nsr' or 'save' restriction.  How scrict
is the location restriction? Must it be in /usr/[s]bin?  If so, I think that's
a little too strict.  We like to keep OS stuff seperate from "custom" stuff on
our filesytems.  Makes it easy to upgrade and move server around as needed.

I would also add the ability to pass any number of arguments to these scripts.
If we want to keep all configs on the server, then the server must be able to
pass any options as necessary.

> * When the group runs, these two commands (if present) are included as
> save sets in the work list. There would need to be dependencies as
> follows between save sets for any one client:
>
> successful pre-command --> filesystems --> post command --> index,
> however I would think this is possible since the functionality already
> exists to handle special save sets that need to be done in order, e.g.
> index and bootstrap.
>
> * If the pre command fails, the rest of the save sets for that client
> would be removed from the work list. This is easy to do from the server.
> There is only one shot at the pre-processing, unlike with savepnpc where
> if it fails it will be attempted as many times as the number of save
> sets multiplied by client retries.


> * The post command output would be sent to the normal group completion,
> unlike with savepnpc where it is detached and goes down a black hole.

Yup, a big gripe on my list, too.  I imagine, that if pre/post are on a
worklist like you suggest, this would be taken care of for us.

> * A timeout could be incorporated as with savepnpc (another new field in
> the client resource?). Personally I don't think it is essential but
> customers like it. On timeout, savegrp could optionally kill running
> saves and remove others for this client from the work list, this is
> something that customers would like, and is difficult to do from the
> client with savepnpc - much easier if controlled from the server.

The timeout can be a good thing.  In our situation, we rely on it so that our
oracle cold backups are guaranteed to be up and running before manufacturing is
scheduled to start work.  It'd kinda suck if a drive failed and BU's got
extended for a significant amt of time.  We still have the option of backing up
archive logs to protect us - just more work for the DBA's.

Just need a reliable, timely method of reporting it that's easy to access.  I
think an env var would work great hear.

> There is obviously a strong requirement for pre and post processing with
> backups, the amount of discussion on this list is evidence of this (or
> is it just because savepnpc is so troublesome?). So I wonder why such a
> poor job has been done of implementing this. What do others think?

BTW, I gave some thought on grepping for the 'savepnpc' process.  Not good as
savepnpc is still running to take care of the post functions.  Looks like a
nsradmin work list query might be the best option.  So I have in order of
preference from good to bad:

- Work list query - A little coding but the most accurate and timely of the
bunch

- mminfo query for 'undefined' in sscomp - Easy to code, must wait for drive
status to go from 'writing,idle' to 'writing,done'.  This takes about 30
seconds and things may/may not have happened during that time that we should
know about.

- grepping /nsr/logs/savepnpc.log - Very easy but not enough info contained in
the file for accurate parsing.  Might also need to handle special cases were
days boundaries are crossed.

- Do nothing and have resume ready

Comments?

Regards,
        /\/elson



--
~~ ** ~~  If you didn't learn anything when you broke it the 1st ~~ ** ~~
                        time, then break it again.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=