ADSM-L

Re: [ADSM-L] Deduplication "number of chunks waiting in queue" continues to rise?

2013-12-20 17:11:34
Subject: Re: [ADSM-L] Deduplication "number of chunks waiting in queue" continues to rise?
From: "Colwell, William F." <bcolwell AT DRAPER DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 20 Dec 2013 22:09:34 +0000
Hi Wanda,

some quick rambling thoughts about dereferenced chunk cleanup.

Do you know about the 'show banner' command?  If IBM sends you an e-fix, this
will tell you what it is fixing.

tsm: xxxxx>show banner
********************************************************************
* EFIX Cumulative level 6.3.4.207                                  *
* This is a Limited Availability TEMPORARY fix for                 *
* IC94121 - ANR2033E DEFINE ASSOCIATION: Command failed - lock con *
*           when def assoc immediately follows def sched.          *
* IC95890 - Allow numeric volser for zOS Media server volumes.     *
* IC93279 - Redrive failed outbound replication connect requests.  *
* IC93850 - PAM authentication login protocol exchange failure     *
* wi3187  - AUDIT LIBVOLUME new command                            *
* IC96637 - SERVER CAN HANG WHEN USING OPERATION CENTER            *
* IC95938 - ANR9999D_2644193874 BFCHECKENDTOEND DURING RESTORE/RET *
* IC96993 - MOVE NODEDATA OPERATION MIGHT RESULT IN INVALID LINKS  *
* IC91138 - Enable audit volume to mark one more kind invalid link *
*           THE RESTARTED RESTORE OPERATION MAY BE SINGLE-THREADED *
*           Avoid restore stgpool linking to orphaned base chunks  *
* WI3236  - Oracle T10000D tape drive support                      *
* 94297   - Add a parameter DELETEALIASES for DELETE BITFILE utili *
* IC96462 - Mount failure retry for zOS Media server tape volumes. *
* IC96993 - SLOW DELETION OF DEREFERENCED DEDUPLICATED CHUNKS      *
* This cumulative efix server is based on code level               *
* made generally available with FixPack 6.3.4.200                  *
*                                                                  *
********************************************************************


I have 2 servers on 6342.006 and 2 on 6342.007.  I have .009 efix waiting to be 
installed
on my biggest, oldest, badest server to fix the chunks in queue problem.

On 3 servers, the queue is down to 0, and they usually run without a problem.  
On the big bad
one, here are the stats -

tsm: WIN1>show dedupdeleteinfo
 ****Dedup Deletion General Status****
 Number of worker threads          : 15
 Number of active worker threads   : 1
 Number of chunks waiting in queue : 11326513

    ****Dedup Deletion Worker Info****
    Dedup deletion worker id    : 1
    Total chunks queued         : 0
    Total chunks deleted        : 0
    Deleting AF Entries?        : Yes
    In error state?             : No

    Worker thread 2 is not active

    Worker thread 3 is not active

    Worker thread 4 is not active

    Worker thread 5 is not active

    Worker thread 6 is not active

    Worker thread 7 is not active

    Worker thread 8 is not active

    Worker thread 9 is not active

    Worker thread 10 is not active

    Worker thread 11 is not active

    Worker thread 12 is not active

    Worker thread 13 is not active

    Worker thread 14 is not active

    Worker thread 15 is not active

    ------------------------------------------
    Total worker chunks queued     : 0
    Total worker chunks deleted    : 0


The cleanup of reclaimed volumes is done by the thread which has 
' Deleting AF Entries?        : Yes'.  The pending efix is supposed to
get this process to finish.  It never finishes on this server, something about 
a bad
access plan.

When I have a lot of volumes which are empty but won't delete, I generate
move data commands for them.  Move data to the same pool will manually do what
the chunk cleanup process is trying to do.

Regards,

Bill Colwell
Draper lab

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Prather, Wanda
Sent: Thursday, December 19, 2013 11:36 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Deduplication "number of chunks waiting in queue" continues to rise?

TSM 6.3.4.00 on Win2K8
Perhaps some of you that have dealt with the dedup "chunking" problem can 
enlighten me.
TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day

I currently have more than 2 TB (yep, terabytes)  of volumes in that file pool 
that will not reclaim.
We were told by support that when you do:

SHOW DEDUPDELETEINFO
That the "number of chunks waiting in queue" has to go to zero for those 
volumes to reclaim.

(I know that there is a fix at 6.3.4.200 to improve the chunking process, but 
that has been APARed, and waiting on 6.3.4.300.)

I have shut down IDENTIFY DUPLICATES and reclamation for this pool.
There are no clients writing into the pool, we have redirected backups to a 
non-dedup pool for now to try and get this cleared up.
There is no client-side dedup here, only server side.
I've also set deduprequiresbackup to NO for now, although I hate doing that, to 
make sure that doesn't' interfere with the reclaim process.

But SHOW DEDUPDELETEINFO shows that the "number of chunks waiting in queue" is 
*still* increasing.
So, WHAT is putting stuff on that dedup delete queue?
And how do I ever gain ground?

W



**Please note new office phone:
Wanda Prather  |  Senior Technical Specialist  | Wanda.Prather AT icfi DOT com  
|  www.icfi.com
ICF International  | 443-718-4900 (o)