ADSM-L

Re: NFS MOUNTS

2002-03-15 19:14:37
Subject: Re: NFS MOUNTS
From: Kent Monthei <Kent.J.Monthei AT GSK DOT COM>
Date: Fri, 15 Mar 2002 17:02:03 -0500
First, did you stop/restart the scheduler on the TSM Clients after the 
downtime and after each configuration change?  If not, try that before 
reading on.  My understanding (consistent with past reading and 
experience) is that 'dsmc' won't pick up changes unless/until it's 
restarted, and won't necessarily stop/restart itself after a broken tcp/ip 
session.  If the scheduler cannot connect to the TSM Server during the 
backup schedule window, or is restarted after the close of the schedule 
window, it will just reschedule itself for the next backup window.  Dig 
into the tSM Clients' 'dsmsched.log' and 'dsmerror.log' files for more 
info on what's going on.

We occasionally experience TSM Client hangs on stale NFS handles on 
Solaris, Digital and SGI clients.  By policy, we don't back up any NFS 
filesystems, just local.  Nevertheless, during the initial client/server 
exchange of data, the client has to walk the OS filesystem just like 'du' 
and 'ls -R' and when it encounters a stale NFS mount will just hang there 
like they do.  This is an OS problem; I don't think TSM can be configured 
to completely avoid it.  We think that two prior recommendations from 
ADSM-L (setting NFSTIMEOUT=120 and renaming/disabling 'dsmstat') helped in 
some cases, but the mountpoint containing the stale NFS handle was still 
skipped.

Since mountpoints are usually at the top level of the filesystem, directly 
under root '/', and since root '/' is always the first filesystem mounted, 
it's virtually guaranteed that with a default configuration - no domain 
statements; using 'dsmc sched' only; no client-initiated 'dsmc incr <f/s>' 
processes - the client will hang on the first filespace '/', and all 
including '/' will miss.

When it occurs, we can mitigate the problem: a) by running individual 
'dsmc incr <f/s>' commands on the client for the other unaffected 
filespaces; or b) by adding domain statements in the reverse order of that 
reported by 'dsmc query opt' ('dsmc show opt', depending on your *SM 
release); or c) by adding 'exclude.fs /' to your inclexcl file.  This 
still either hangs on or skips '/', so it does not get backed up - but 
everything else usually completes. 

Our Unix sys admins have had to reboot Unix TSM Clients to clear stale NFS 
handles and restore our ability to backup '/' or whatever mountpoint 
contains the stale NFS handle.  You need to identify and eliminate the 
root cause of repeated stale NFS handles, which could just be a user's bad 
habit of exporting and then remotely-mounting a cd from a different 
server, then removing the cd from the drive before unmounting/unexporting 
it.

-my $.02
Kent Monthei
Kent Monthei
GlaxoSmithKline






"Adams, Mark" <Mark_Adams AT CSGSYSTEMS DOT COM>

Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
15-Mar-2002 13:48
Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>

 
 

        To:     ADSM-L

        cc: 
        Subject:        Re: NFS MOUNTS

All of the TSM code is on a local filesystem.
Just TSM client activity hangs.
Of course df will hang as well.

<Prev in Thread] Current Thread [Next in Thread>