Networker

Re: [Networker] Immediate reporting of client-level backup failures

2003-08-15 06:26:20
Subject: Re: [Networker] Immediate reporting of client-level backup failures
From: Davina Treiber <Treiber AT HOTPOP DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Fri, 15 Aug 2003 09:00:10 +0100
John Hope-Bailie wrote:
Good-day all,

We have a requirement in which rapid response is required to failed client
backups.

The current approach using email/pager notifications linked to savegroup
completion events is not working well enough.

There are hundreds of clients in about 10 groups, so the groups often run
for 12 hours (or more).

The savegroup notification is only issued when the savegroup finally ends.

If a client fails early on,  this would typically only be reported say 10
hours later.

This is too late to investigate and fix the client problem during the backup
window.

We have been advised to try the "owner notification" feature which is
configurable per client.

Functionally, this is better, but I seem to recall that this too, is only
issued when the entire savegroup completes.


Question

Has anyone been faced with this requirement to take immediate corrective
action to failed client backups during the backup window ?

If so, how did you arrange per/client failure notifications in real time ?

I suppose it might be possible to hook into the save  processes on the
client itself and work from there ?

Has anyone tried anything like this ?

I had a customer who had this requirement. Their solution was quite
complex. They scheduled client backups from an external scheduler, and
for each client they ran a script that allocated a client to a group
from a pool of groups, then running the group with a single client in
it. After the group completed the client was taken out of the group to
free the group up for the next client.

It worked, but initially the script was problematic and it was several
months before the script was fully (?) debugged. IMHO it was of limited
use, because they didn't have staff wotking 24x7 to diagnose problems
and re-run any failures. It was probably more trouble than it was worth,
since the script was just something extra to support, and occasionally
the cause of the problems. I wouldn't really recommend this solution,
and sorry no, I can't supply the scripts used.

I suppose a simpler variation would be to have all the clients in all
the groups and just run groups with the -c parameter after finding an
unused group.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=