ADSM-L

Re: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to LTO4 tape

2014-09-17 11:49:33
Subject: Re: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to LTO4 tape
From: "Prather, Wanda" <Wanda.Prather AT ICFI DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 17 Sep 2014 15:47:27 +0000
Been there done that.
BACKUP STGPOOL for a sequential-dedup-pool-which-hasn't-deduped-yet performance 
is a nightmare.
Have been fighting this for a year. 

We ingest and dedup 4-6 TB of data per day, in a 45TB sequential pool.
We had to back off and land the data on a random pool, backup stgpool from 
there, then migrate to the dedup pool, as you suggested.

There are things you can try before giving up:

1.  Most likely culprit:
Do Q OPT, look for NUMOPENVOLSALLOWED.  Suppose it's set to 100.
Now when your BACKUP STGPOOL is running, do a Q MOUNT.  If you have 100 of your 
dedup pool files open, then you need to increase the value of 
NUMOPENVOLSALLOWED.  Rinse and repeat until your BACKUP STGPOOL isn't bumping 
up against that limit (which means I guess it's waiting to get a file handle?  
I guess that's what these are?)

I haven't found any documentation to explain why the default is set low, or 
what the drawbacks are of setting the number too high, or what "too high" might 
be.  All I know is that I discovered the problem and when I raised the number 
things got better with no apparent bad side effects.  I finally set it to a 
number larger than the number of files in my dedup pool. 
 
2.  Server code
You don't say what server version you are running; get to 6.3.4.300.  Otherwise 
you can have things bogged down in the DEDUPDELETE queue.  Which shouldn't 
affect the backup stgpool I guess, but seems to eventually affect everything 
related to that pool.

3.  DB I/O
Disk storage we are using delivers 10000 IOPS/sec; 90% of the I/O is on the 
TSM/DB + active log.  So even if you are getting good throughput on the DB 
Backup, what you need to be looking at is the DB I/O response time.  Should be 
< 5 ms.

And, your DB needs to be split across 8 or more filesystems so that DB2 will 
start more concurrent I/Os.  We went from 2 luns, to 4, to 8, then moved those 
8 to SSD, got a performance boost each step.   (You'll find all sorts of 
discussion about whether they need to be filesystems or LUNS; that depends on 
what kind of disk you have.  Point is, DB2 needs to start more concurrent 
I/O's, which means the DB needs to be spread across 8 different directories, or 
more.  And the disk behind those directories needs to support as much I/O as 
DB2 wants to do, as fast as it needs to do it to get 5 ms or less response.)  
If you can't get a good feel for that, I'd open a performance ticket and have 
them interpret a server performance trace for you. 

That's all I know so far.
Wanda


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Matthew McGeary
Sent: Wednesday, September 17, 2014 9:48 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to LTO4 
tape

Hello All,

We've been struggling with somewhat anemic backup performance from our dedup 
storage pools to LTO4 tape over the past year or so.  Data is ingested from 
clients and lands directly into our dedup pool, but doesn't get deduplicated 
until the offsite operation is complete.
(deduprequiresbackup=yes)  Our dedup pool is comprised of 50GB volumes, 
residing on a V7000 pool of NL-SAS drives.  The drives don't appear taxed 
(utilization is consistently in the 30-40% range) but average throughput from 
the storage pool to tape is only 100-100 MB/s.  This is starting to present 
challenges for meeting the offsite window and I am stumped as to how I might 
improve performance.

The TSM server is running on AIX and has four 8Gb paths to the storage, running 
sddpcm.  Mountpoints containing the data are mounted rbrw and are
JFS+ volumes.  Our tape drives are running off of two dedicated 4Gb HBAs
and our backup DB throughput is excellent, averaging 350-400 MB/s.

For those of you that are running TSM dedup, how are you managing your 
offsiting process?  Are you using a random devclass pool as a 'landing zone' 
for backup/offsite operations before migration to the dedup stg? Are there 
tuning parameters that you have tweaked that have shown improvement in FILE 
devclass stg pools to LTO devices?

Any and all tips would be appreciated.  I've been through the parameters listed 
in the perf guide and have allocated large memory pages to the TSM server but I 
haven't seen much, if any, improvement.

Thanks!

Matthew McGeary
Technical Specialist
PotashCorp - Saskatoon
306.933.8921