ADSM-L

Re: [ADSM-L] Q about TSM notification in a widely-distributed environment

2012-11-07 11:36:36
Subject: Re: [ADSM-L] Q about TSM notification in a widely-distributed environment
From: Skylar Thompson <skylar2 AT U.WASHINGTON DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 7 Nov 2012 08:25:11 -0800
I'll second that. Our policy is that no node should generate a non-zero
exit code at the end of a backup. There's a couple things that we do:

* Exclude directories that generated errors that do not need to be
backed up (Firefox cache, all sorts of Apple stuff, etc.)

* Work with users to identify files that don't need SHRDYNAMIC copy
serialization. This reduces the number of retries the client does. For
data in a workflow pipeline, eventually it will be static, and there's
no point spending a lot of time waiting for it to become static.

* Provide directories that users can create that are never backed up (in
our case, these are NoBackup, NOBACKUP, nobackup, nobackups, NoBackups,
and NOBACKUPS). We are an HPC shop, and many files are created on
filesystems that we backup that are transient and are only useful for
the life of the job. If the job is running while backups are happening,
then we get lots of errors when these files are removed. After a
disaster, that job would just be resubmitted based on source data that
are elsewhere on the filesystem. Users can create those directories
anywhere, and files in those directories are never backed up.

-- Skylar Thompson (skylar2 AT u.washington DOT edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine

On 11/ 7/12 08:02 AM, Arbogast, Warren K wrote:
One more thought. In my experience some-to-many client admins assume that 
failed file backups (file in use, file not found, file changed, etc) are the 
cause of failed backups (condition code 12). If the failed files aren't 
important to them they look no further for the cause of the failed backup. I 
believe this misunderstanding exacerbates their habit of ignoring automated 
alerts. We have text in the alert message to correct that perception, but we 
still hear that opinion.

Keith