Re: [ADSM-L] Stupid question about TSM server-side dedup

Wanda,

Are the identify processes issuing any failure notices in the activity log ?

You can check if id dup processes have found duplicate chunks yet to be
reclaimed by running 'show deduppending <stgpoolname>'  WARNING, can
take a long time to return if stgpool is large, don't panic !

I am unfamiliar with NDMP backup but off the top of my head a couple of
other (simple) things to check would be:
is the server-side SERVERDEDUPETXNLIMIT option set very low  and
preventing dedup id ?

Have these dumps been backed up to copypool yet ? ( perhaps you've
overlooked the deduperequiresbackup option at the server )?
- IIRC the identify processes run but find nothing if this option is set
and the data has not yet been backed up to copypool.

Ian Smith


On 22/11/11 15:17, Colwell, William F. wrote:

Wanda,

when id dup finds duplicate chunks in the same storagepool, it will
raise the pct_reclaim
value for the volume it is working on.  If the pct_reclaim isn't going
up, that means there
are no duplicate chunks being found.  Id dup is still chunking the
backups up (watch you database grow!)
but all the chunks are unique.

Is it possible that the ndmp agent in the storage appliance is putting
in unique metadata with each file?
This would make every backup appear to be unique in chunk-speak.

I remember from the v6 beta that the standard v6 clients were enhanced
so that the metadata could
be better identified by id dup and skipped over so that it could just
work on the files and get
better dedup ratios.  If id dup doesn't know how to skip over the
metadata in an ndmp stream, and
the metadata is always changed, then you will get very low dedup ratios.

If you do a 'q pr' while the id dup is running, do the processes say
they are finding duplicates?

Bill Colwell
Draper Lab

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Prather, Wanda
Sent: Monday, November 21, 2011 11:41 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Stupid question about TSM server-side dedup

Have a customer would like to go all disk backups using TSM dedup.  This
would be a benefit to them in several respects, not the least in having
the ability to replicate to another TSM server using the features in
6.3.

The customer has a requirement to keep their NDMP dumps 6 months.  (I
know that's not desirable, but the backup group has no choice in the
matter right now, it's imposed by a higher level of management.)

The NDMP dumps come via TCP/IP into a regular TSM sequential filepool.
They should dedup like crazy, but client-side dedup is not an option (as
there is no client).

So here's the question.  NDMP backups come into the filepool and
identify duplicates is running.  But because of those long retention
times, all the volumes in the filepool are FULL, but 0% reclaimable, and
they will continue to be that way for 6 months, as no dumps will expire
until then.  Since the dedup occurs as part of reclaim, and the volumes
won't reclaim -how do we "prime the pump" and get this data to dedup?
Should we do a few MOVE DATAs to get the volumes partially empty?


Wanda Prather  |  Senior Technical Specialist  |
wprather AT icfi DOT com<mailto:wprather AT icfi DOT com>   |
www.icf.com<http://www.icf.com>
ICF International  | 401 E. Pratt St, Suite 2214, Baltimore, MD 21202 |
410.539.1135 (o)
Connect with us on social media<http://www.icfi.com/social>