ADSM-L

[ADSM-L] Backup fails with no error message

2014-07-07 16:02:10
Subject: [ADSM-L] Backup fails with no error message
From: Thomas Denier <Thomas.Denier AT JEFFERSON DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 7 Jul 2014 19:59:57 +0000
We have an AIX system on which backups of a specific file system terminate with
exit status 12 but with no error message indicating a reason for this exit 
status.
If I execute the command

dsmc inc /main/UT -servername=DC1P1_MAIN

as root, I will see typical messages about the number of files processed and 
about
specific files being backed up, followed by the usual summary messages. The exit
status will be 12. The summary statistics will show a number of files examined
equal to about half the number of files present in the file system. There will 
not
be any error message explaining the exit status or the failure to examine the
entire file system.

The DCIP1_MAIN stanza in dsm.sys has some unusual features because it is used
to back up one of the resource groups for a clustered environment. The stanza
includes three 'domain' statements listing the file systems in the resource 
group.
The stanza includes a 'nodename' option specifying the node name that owns the
backup files from the resource group. The stanza includes an 'asnode' option
specifying the node name used to authenticate sessions from the cluster node
involved (we and the system vendor were not able to agree on an acceptable
arrangement for storing a TSM password within the resource group). This
stanza works fine for the other file systems in the same resource group, and
worked fine for /main/UT up until June 26.

I have found two ways to circumvent the problem. One circumvention is to run
the command

dsmc inc /main/UT/ -subdir=y -servername=DC1P1_MAIN

to back up the top level directory of the file system rather than the file 
system
as such. An 'lsfs' command shows nothing unusual about the file system; it is
a jfs2 file system, like all the other file systems, and uses the same mount
options as the other file systems. The other circumvention is to add an
'exclude.dir' line for a specific subdirectory of /main/UT to the 
include/exclude
file. The subdirectory came under suspicion because it was last updated a few
hours after the last fully successful backup.

The client code is TSM 6.4.1.0. The client OS is AIX 7.1. The TSM server is TSM
6.2.5.0 running under zSeries Linux.

Does anyone recognize this as a known problem? If not, does anyone have
suggestions for presenting the problem to TSM support? I am having
difficulty imagining any kind of productive interaction if I don't have a
message identifier to report.

Thomas Denier
Thomas Jefferson University Hospital
The information contained in this transmission contains privileged and 
confidential information. It is intended only for the use of the person named 
above. If you are not the intended recipient, you are hereby notified that any 
review, dissemination, distribution or duplication of this communication is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender by reply email and destroy all copies of the original message.

CAUTION: Intended recipients should NOT use email communication for emergent or 
urgent health care matters.