Been there done that.
BACKUP STGPOOL for a sequential-dedup-pool-which-hasn't-deduped-yet performance
is a nightmare.
Have been fighting this for a year.
We ingest and dedup 4-6 TB of data per day, in a 45TB sequential pool.
We had to back off and land the data on a random pool, backup stgpool from
there, then migrate to the dedup pool, as you suggested.
There are things you can try before giving up:
1. Most likely culprit:
Do Q OPT, look for NUMOPENVOLSALLOWED. Suppose it's set to 100.
Now when your BACKUP STGPOOL is running, do a Q MOUNT. If you have 100 of your
dedup pool files open, then you need to increase the value of
NUMOPENVOLSALLOWED. Rinse and repeat until your BACKUP STGPOOL isn't bumping
up against that limit (which means I guess it's waiting to get a file handle?
I guess that's what these are?)
I haven't found any documentation to explain why the default is set low, or
what the drawbacks are of setting the number too high, or what "too high" might
be. All I know is that I discovered the problem and when I raised the number
things got better with no apparent bad side effects. I finally set it to a
number larger than the number of files in my dedup pool.
2. Server code
You don't say what server version you are running; get to 6.3.4.300. Otherwise
you can have things bogged down in the DEDUPDELETE queue. Which shouldn't
affect the backup stgpool I guess, but seems to eventually affect everything
related to that pool.
3. DB I/O
Disk storage we are using delivers 10000 IOPS/sec; 90% of the I/O is on the
TSM/DB + active log. So even if you are getting good throughput on the DB
Backup, what you need to be looking at is the DB I/O response time. Should be
< 5 ms.
And, your DB needs to be split across 8 or more filesystems so that DB2 will
start more concurrent I/Os. We went from 2 luns, to 4, to 8, then moved those
8 to SSD, got a performance boost each step. (You'll find all sorts of
discussion about whether they need to be filesystems or LUNS; that depends on
what kind of disk you have. Point is, DB2 needs to start more concurrent
I/O's, which means the DB needs to be spread across 8 different directories, or
more. And the disk behind those directories needs to support as much I/O as
DB2 wants to do, as fast as it needs to do it to get 5 ms or less response.)
If you can't get a good feel for that, I'd open a performance ticket and have
them interpret a server performance trace for you.
That's all I know so far.
Wanda
-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Matthew McGeary
Sent: Wednesday, September 17, 2014 9:48 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to LTO4
tape
Hello All,
We've been struggling with somewhat anemic backup performance from our dedup
storage pools to LTO4 tape over the past year or so. Data is ingested from
clients and lands directly into our dedup pool, but doesn't get deduplicated
until the offsite operation is complete.
(deduprequiresbackup=yes) Our dedup pool is comprised of 50GB volumes,
residing on a V7000 pool of NL-SAS drives. The drives don't appear taxed
(utilization is consistently in the 30-40% range) but average throughput from
the storage pool to tape is only 100-100 MB/s. This is starting to present
challenges for meeting the offsite window and I am stumped as to how I might
improve performance.
The TSM server is running on AIX and has four 8Gb paths to the storage, running
sddpcm. Mountpoints containing the data are mounted rbrw and are
JFS+ volumes. Our tape drives are running off of two dedicated 4Gb HBAs
and our backup DB throughput is excellent, averaging 350-400 MB/s.
For those of you that are running TSM dedup, how are you managing your
offsiting process? Are you using a random devclass pool as a 'landing zone'
for backup/offsite operations before migration to the dedup stg? Are there
tuning parameters that you have tweaked that have shown improvement in FILE
devclass stg pools to LTO devices?
Any and all tips would be appreciated. I've been through the parameters listed
in the perf guide and have allocated large memory pages to the TSM server but I
haven't seen much, if any, improvement.
Thanks!
Matthew McGeary
Technical Specialist
PotashCorp - Saskatoon
306.933.8921
|