ADSM-L

Re: URGENT: ADSM Version 3 Reclamation Problem (IX74458)

1998-02-05 10:19:38
Subject: Re: URGENT: ADSM Version 3 Reclamation Problem (IX74458)
From: "Smith, Richard" <smithrr AT MARITZ DOT COM>
Date: Thu, 5 Feb 1998 09:19:38 -0600
ADSMer's

        Sorry to repost this, but I must be sure before we go on with
our conversion this weekend.  Are the two problems below the same??

ADSM development has discovered a potential ADSM version 3 server
problem in
media reclamation processing:

The following are NOT affected by this problem:

Customers using version 1 or 2 servers

ADSM client space management data

Data backed up or archived  t o a version 2 server prior to upgrade to
version 3

Data that has never been reclaimed by a version 3 server

We STRONGLY recommend that customers on version 3 servers set their
RECLAMATION
THRESHOLD for all sequential storage pools to 100% to disable
reclamation
processing.

IBM ADSM development will provide updates for version 3 servers to
correct this
problem early during the week of January 26, 1998. Additional
information on
this problem will also be provided.

Mike Kaczmarski
IBM Corporation
ADSM Development
kacz AT us.ibm DOT com

Thanks,
Rick Smith
Maritz, Inc.
Storage & Security Administration
smithrr AT maritz DOT com
(314) 827-1584

> ----------
> From:         Michael Kaczmarski[SMTP:kacz AT US.IBM DOT COM]
> Sent:         Tuesday, January 27, 1998 4:56 PM
> To:   ADSM-L AT VM.MARIST DOT EDU
> Subject:      URGENT: ADSM Version 3 Reclamation Problem (IX74458)
>
> On January 22, 1998, ADSM development discovered an error in the ADSM
> Version 3 server reclamation processing for aggregated client files.
> Under rare circumstances (less than 2% of all aggregates reconstructed
> using default settings) the reclamation function will modify the last
> buffer of data when reconstructing an aggregate file.  As a result,
> the
> last file (or set of files if they are very small) in an aggregate may
> not contain valid data.
>
> A new construct was added in ADSM version 3 to improve performance and
> lower overhead by automatically packaging small client files into
> larger
> objects for management on the ADSM server.  The larger objects are
> called "aggregates".  Reconstruction processing occurs during server
> reclamation processing to remove vacant space from aggregates.  Vacant
> space is introduced when one or more client files in an aggregate are
> expired by storage management policy (e.g. versioning and/or
> retention).
>
>
> THE FOLLOWING ARE *NOT* AFFECTED BY THIS PROBLEM:
>
>    - Customers using Version 1 or Version 2 ADSM Servers
>    - ADSM client space management data on any version of the ADSM
> server
>    - Data backed up or archived  to a version 2 server
>        prior to upgrade to version 3
>    - Data that has never been reclaimed by a version 3 server
>
>
> Updated servers to correct this problem are available via anonymous
> ftp
> to index.storsys.ibm.com:
>
>    AIX:        /adsm/fixes/v3r1/aixsrv/IX74458.bff
>         (use SMIT to install the image)
>    Windows NT: /adsm/fixes/v3r1/ntsrv/ix74458.exe
>                /adsm/fixes/v3r1/ntsrv/ix74458.txt
>                /adsm/fixes/v3r1/ntsrv/unpack.bat
>    MVS :       A ZAP that can be applied to the MVS Version 3 server
> to
>                correct the problem is located in
>                /adsm/fixes/v3r1/mvssrv/IX74458.ZAP
>
>
> The updated server will not reconstruct aggregates during reclamation.
> Reclamation processing WILL copy aggregates to new media to free
> up volumes, but empty space will still be left in the aggregates.
>
> IBM recommends that you obtain and install the corrected version of
> the
> server as soon as is possible.  If this cannot be done for some
> reason,
> we recommend that you set your reclamation threshold to 100% to
> prevent
> automatic reclamation, and use the MOVE DATA command to manually
> reclaim
> media, if needed.
>
>
> To Assist in Recovery from this Error:
>
> In the next few weeks IBM will deliver an updated server with utility
> commands that can be used to locate and deal with client files that
> have
> been affected by this problem.  The functions will identify files that
> may have been affected by this error so that they can be examined or
> removed from the ADSM server.  These utility commands will be included
> in all future version 3 server deliverables. Detailed documentation
> will describe the use of the tools.
>
>
> Common Questions:
>
> 1) What are the technical details of the problem ?
>
>    During reclamation, file aggregates may be "reconstructed" so
>    that space, which is vacant due to expiration, is "squeezed" out of
> the
>    aggregate.  On rare occasions, a sequence of buffer copies leading
> up
>    to the final buffer incorrectly calculates the source location for
>    the data copy on the last buffer.  As a result, information is
> lost.
>
> 2) How prevalent is this problem ?
>
>    Detailed calculations have shown that the chance of the error
>    occurring are less than 1 to 2 times in 100 when an aggregate
>    is reconstructed, using default settings.  Customers that are using
>    the USELARGEBUFFERS NO option in their server options file will
>    be affected to a much higher degree.  Since this is NOT a default
>    setting, and can actually decrease performance, we doubt that many
>    have chosen this option.
>
> 3) Will reclamation continue to work with the updated/ZAPed server ?
>
>    Yes, media will be emptied when the reclamation threshold is met,
>    but file aggregates will not be reconstructed.  This may leave
> empty
>    space in the aggregates.
>
> 4) When will full reclamation work again ?
>
>    When the updated server utilities are delivered to identify and
> deal
>    with files that have been affected, you will be able to resume
> normal
>    reclamation/reconstruction processing after finishing your analysis
> and
>    cleaning up the utility entries in the database.
>
> 5) Will the AUDIT VOLUME detect this problem ?
>
>    No. very few bytes at the end of an aggregated client file are
> affected
>    when the error occurs.  The byte range affected is not checked by
> an
>    AUDIT VOLUME command.
>
> 6) Can you absolutely determine which files have been affected ?
>
>    We can determine with certainty which aggregates have been
>    reconstructed.  We can determine if the aggregate was affected by
>    the last reconstruction that occurred.  We cannot determine how
> many
>    times an aggregate has been reconstructed.  The utilities that we
>    develop will have options to deal with the possibility that
> multiple
>    reconstruction operations may have affected aggregates.
>
>
> The ADSM development team apologizes for the inconvenience and effort
> that this problem may cause.  We will make every effort to assist in
> your recovery from this situation.
>
> Mike Kaczmarski
> IBM Corporation
> ADSM Development
> kacz AT us.ibm DOT com
>
<Prev in Thread] Current Thread [Next in Thread>