ADSM-L

Re: [ADSM-L] Contents of Copy Storage Pool ?

2007-03-29 09:03:44
Subject: Re: [ADSM-L] Contents of Copy Storage Pool ?
From: David E Ehresman <deehre01 AT LOUISVILLE DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 29 Mar 2007 09:02:52 -0400
Did you run the storage pool check immediately after finishing backup 
storagepool?  That is the only time you can expect that your storage pools 
should be in balance and only if you do not allow backups to proceed during the 
time you are doing storage pool backups.

If that is not the problem, and there are no volumes destroyed, unavailable, or 
readonly, I use the output of   "select node_name as \"Node Name 
\",stgpool_name as \"STGPool  \",cast\(sum(num_files\) as 
decimal\(10,0\)\),cast\(sum\(logical_mb\) as decimal \(15,2\)\) as \"Log Occ 
\(MB\)\"from occupancy  group by node_name,stgpool_name order by 
node_name,stgpool_name" to find which tsm node is out of balance and pursue it 
from that node's volumes.  (The "\" in the above select are unix quoting 
characters because the select is run as a unix shell command. They can be 
removed if not running it as a shell command)

David Ehresman

>>> Larry Peifer <Larry.Peifer AT SCE DOT COM> 3/28/2007 7:46 PM >>>
Richard wrote:
> And I always like to point to the elegant Select which Wanda
> submitted many moons ago, which was well worth immortalizing in
> http://people.bu.edu/rbs/ADSM.QuickFacts in entry "Copy Storage Pool
> up to date?".

Out of curiosity I ran the 'elegant Select' and much to my chagrin found a
Net_Files = 179,755 in one case and 3740 in the other.  So now I'm on a
mission to find and fix whatever the problem is.  Any help or ideas would
be appreciated.

TSM Server 5.3.1 running on AIX 5.3 MR4
Clients running 5.2.x and 5.3.x

We run a 'closed' system, ie. all tapes stay in all libraries all the time
using electronic vaulting.  The primary tape library has an exact
duplicate copy library at a remote location.

TAPEPOOL1       Primary         MS window server data only 179,755 more
files here
TAPEPOOL2       Copy

TAPEPOOL4       Primary         AIX server and Oracle DB / Archive log
data only       3740 more files here
TAPEPOOL5       Copy

Backup stgpool TAPEPOOL1 TAPEPOOL2
and
Backup stgpool TAPEPOOL4 TAPEPOOL5
run daily to successful completion.

100% disk pool migration occurs prior to each backup stgpool.
Expiration and reclamation occur daily.
Incremental backups occur daily or weekly.
CRC checking is on for all sequential stgpools.
All tapes are accounted for in all libraries daily, no unavailable, no
readonly, none missing.
No tape volumes show r/w errors.
All recoveries in the last 6 years have been successful - there has never
been a file or database damaged or missing.
Three LTO1 tapes have gone bad during this time and been replaced in
tapepool1 and tapepool2 and move data used to move the files to another
volume in the same storage pool.
Both tapepool pairs have very similar tape usage counts in them.

We have not been doing any sort of periodic audit volumes.  It looks like
that will change.
I did an audit volume for  the last 2 days and it was 100% ok.

Any ideas on what else to check on or how to go about isolating the
problem would be appreciated.  Obviously, we've overlooked something.