Re: Archives and management classes for directories

Hello,
your situation is addressed by apar IX89638. For your convenience,
I have included the text of the apar here below :

****************************************************************
* PROBLEM DESCRIPTION:                                         *
server stores duplicate database entries for archive directories
****************************************************************
****************************************************************
* RECOMMENDATION:                                              *
apply fixing ptf when available
****************************************************************
The server enters a complete set of directory levels in the
database each time a client issues an archive request.  This
results in duplicate entries for the same archive directory
levels.  Also, archive directories never expire even though
there are no files that reference it.  This redundancy results
in extra database usage and increased search and expiration
times.
PROBLEM CONCLUSION:
The fix for ix89638 includes a utility to remove duplicate archive
directories, and other changes.

DUPLICATE ARCHIVE DIRECTORY REMOVAL UTILITY:
An archive directory is defined to be unique by:  node, file-
space, directory/level, owner and description.  The duplicate
directory removal utility will retain the oldest unique direc-
tory and remove all younger duplicates.  The utility is started
as a process, may be canceled by the existing CANCEL PROCESS and
later resumed.  Archive cleanup jobs may be queried, and a
cancel command exists to remove a job from the list of resumable
cleanup jobs.

Syntax:

CLEAN ARCHDIRectories:  starts duplicate archive directory re-
  moval for all nodes or a list of nodes, or resumes a cleanup
  job that was canceled.

  ---- CLEAN ARCHDIRectories -----------------------------------
                              |             |  |
                              +-nodeList----+  +-FIX=No|Yes-----
                              |             |
                              +-JOBid=jobId-+

  where:
    nodeList is a comma separated list of node names
    jobId is a resumable cleanup job
    if nodeList or JOBid= are not specified, a new job that
      cleans all nodes is assumed
    FIX=no   every archive directory entry for the nodeList or
             all nodes will be displayed.  This is the default.
       =yes  duplicate directories will be removed

Query ARCHDIRClean:  the standard format of the command lists
  information about a job.  The detailed format additonally
  lists information for each node associated with a job.

  ---- Query ARCHDIRClean --------------------------------------
                          |       |  |                  |
                          +-jobId-+  +-Format=Standard| |
                                              Detailed--+

CANcel ARCHDIRClean:  removes an archive cleanup job from the
  server.

  ---- CANcel ARCHDIRClean ---- jobId --------------------------

OTHER CHANGES:
Changes have also been made that remove archive directories when
they are eligible for expiration and not referenced, and the
same performance enhancement introduced in v3.1 for backup and
archive has been implemented for expiration and filespace dele-
tion.  In addition, the client will not bind archive directories
 to the management class with the longest retention, and it will
not continue to archive duplicate directories (available from
client ptf 3.1.0.7).


ARCHIVE USAGE NOTES/TIPS:
 - the cleanup utility may be run multiple times for the same
   node.  If no duplicates are found, nothing is removed.
 - description is one field that defines a unique archive
    directory:  it is a factor that determines the number of
    archive directory entries in the database.  Customers that
    use the archive function extensively should consider
    including the description field with each archive request,
    especially when the default value is not appropriate to
    to their needs.
****************************************************************
The needed fixes for both the client and server are not yet
officially available.

Have a good day.

-----------------------------------------------------------------
Rejean Larivee
Rejean Larivee
IBM -  ADSM Level 2 Support



Trevor Foley <Trevor.Foley AT BANKERSTRUST.COM DOT AU> on 05/20/99 02:32:06 AM

Please respond to "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>

To:   ADSM-L AT VM.MARIST DOT EDU
cc:    (bcc: Rejean Larivee/Quebec/IBM)
Subject:  Archives and management classes for directories





G'day,

I am having a bit of a problem understanding the why archives work the way that
they do. I am hoping that someone might be able to explain it to me. If not,
I'll log a call with IBM.

First, I like that fact that with ADSM V3 the directories are now save along
with the files. What I am having trouble understanding is why the directories
are bound to the management class with the longest retention period. That makes
sense for backups, but when you perform on archive, all files that are archived
with one command are bound to the same management class. Should directories also
be bound to this management class? The only reason that I see that this should
not be the case is that it is possible to add files to an archive group by
specifying the same description. But is seems that directories get archived
again when you do this.

We have a situation where the number of directories being saved is getting a
little out of control. An example. If I execute the following select command,
where 'DUMP' is the name of a directory:

        select count(*) from archives where node_name='AUPOZA404' and
filespace_name='\\aupoza404\d$' and hl_name='\SYSDATA\SQL\' and ll_name='DUMP'

I get 13782 returned. The SYSDATA and SQL directories have a similar count (each
13785). If I now look at the files within that directory by doing:

        select count(*) from archives where node_name='AUPOZA404' and
filespace_name='\\aupoza404\d$' and hl_name='\SYSDATA\SQL\DUMP\'

I get 2575 returned. So to have 2575 files archived, I have to wear another
13782+13785+13785=41352 database entries for the directories. And this is just
one of approximately 100 ADSM clients.

Many of the files that are archived from this directory are archived for only
short periods (days or a few weeks) but the directories are getting archived for
10 years because the management class with the longest retention specifies 10
years. What then happens is that the files are expired quite quickly, but the
directories do not.

So in the worst case, if I archive a file from this directory with a 1 day
retention, I get 3 directories ('SYSDATA', 'SQL' and 'DUMP') archived as well
which stay around for 10 days. And if this file is archived every day, the
number of directories grows very quickly.

Yes, I could specify -FILESONLY on the archive, but I want the directories
archived with the files. It would also require changes to some application code
that I would like to avoid.

So I have to questions. First, can I change this behaviour? My understanding is
no, so I guess that means talking to IBM. And second, how to I clean up the mess
(automatically and regularly). If we were using the default archive descriptions
(date, time, etc) I can see a way of doing it manually. But we use the same
description every time we run the archive.

thanks,


Trevor