ADSM-L

Re: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to LTO4 tape

2014-09-17 13:43:02
Subject: Re: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to LTO4 tape
From: Matthew McGeary <Matthew.McGeary AT POTASHCORP DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 17 Sep 2014 11:40:36 -0600
Thanks for the quick replies Wanda and Sergio!

We're running TSM server 7.1.0.100 at the moment and I'm not sure if the
fixes contained in 6.3.4.300 are also included in 7.1.0.100.

-The dedup storage pool is backed by 5 luns on the V7000 which are
presented to AIX in a volume group.  Then there are 10 striped volumes in
the VG, with a 256K stripe size to align with TSM I/O size.  This has had
a remarkable effect in evening out the IO load on the V7000 and driving
full utilization of all arrays in the extent pool.
-The database is on one lun presented to AIX and split into 4 volumes. The
lun resides in the SSD extent pool and consistently sees IOPS in the
15-20K range when taxed.
-The numopenvols allowed parameter was set to 25, as specified in the perf
guide.  Based on Wanda's recommendation, I set that value to 500, which is
high enough to cover all the mounted volumes typically seen in a backup
stg command.  I'll report back tomorrow to see if that change made a
difference in throughput.


Matthew McGeary
Technical Specialist
PotashCorp - Saskatoon
306.933.8921



From:   "Prather, Wanda" <Wanda.Prather AT ICFI DOT COM>
To:     ADSM-L AT VM.MARIST DOT EDU
Date:   09/17/2014 09:48 AM
Subject:        Re: [ADSM-L] Backup STG expected throughput from 50G FILE
devclass to LTO4 tape
Sent by:        "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



Been there done that.
BACKUP STGPOOL for a sequential-dedup-pool-which-hasn't-deduped-yet
performance is a nightmare.
Have been fighting this for a year.

We ingest and dedup 4-6 TB of data per day, in a 45TB sequential pool.
We had to back off and land the data on a random pool, backup stgpool from
there, then migrate to the dedup pool, as you suggested.

There are things you can try before giving up:

1.  Most likely culprit:
Do Q OPT, look for NUMOPENVOLSALLOWED.  Suppose it's set to 100.
Now when your BACKUP STGPOOL is running, do a Q MOUNT.  If you have 100 of
your dedup pool files open, then you need to increase the value of
NUMOPENVOLSALLOWED.  Rinse and repeat until your BACKUP STGPOOL isn't
bumping up against that limit (which means I guess it's waiting to get a
file handle?  I guess that's what these are?)

I haven't found any documentation to explain why the default is set low,
or what the drawbacks are of setting the number too high, or what "too
high" might be.  All I know is that I discovered the problem and when I
raised the number things got better with no apparent bad side effects.  I
finally set it to a number larger than the number of files in my dedup
pool.

2.  Server code
You don't say what server version you are running; get to 6.3.4.300.
Otherwise you can have things bogged down in the DEDUPDELETE queue.  Which
shouldn't affect the backup stgpool I guess, but seems to eventually
affect everything related to that pool.

3.  DB I/O
Disk storage we are using delivers 10000 IOPS/sec; 90% of the I/O is on
the TSM/DB + active log.  So even if you are getting good throughput on
the DB Backup, what you need to be looking at is the DB I/O response time.
 Should be < 5 ms.

And, your DB needs to be split across 8 or more filesystems so that DB2
will start more concurrent I/Os.  We went from 2 luns, to 4, to 8, then
moved those 8 to SSD, got a performance boost each step.   (You'll find
all sorts of discussion about whether they need to be filesystems or LUNS;
that depends on what kind of disk you have.  Point is, DB2 needs to start
more concurrent I/O's, which means the DB needs to be spread across 8
different directories, or more.  And the disk behind those directories
needs to support as much I/O as DB2 wants to do, as fast as it needs to do
it to get 5 ms or less response.)  If you can't get a good feel for that,
I'd open a performance ticket and have them interpret a server performance
trace for you.

That's all I know so far.
Wanda


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Matthew McGeary
Sent: Wednesday, September 17, 2014 9:48 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Backup STG expected throughput from 50G FILE devclass to
LTO4 tape

Hello All,

We've been struggling with somewhat anemic backup performance from our
dedup storage pools to LTO4 tape over the past year or so.  Data is
ingested from clients and lands directly into our dedup pool, but doesn't
get deduplicated until the offsite operation is complete.
(deduprequiresbackup=yes)  Our dedup pool is comprised of 50GB volumes,
residing on a V7000 pool of NL-SAS drives.  The drives don't appear taxed
(utilization is consistently in the 30-40% range) but average throughput
from the storage pool to tape is only 100-100 MB/s.  This is starting to
present challenges for meeting the offsite window and I am stumped as to
how I might improve performance.

The TSM server is running on AIX and has four 8Gb paths to the storage,
running sddpcm.  Mountpoints containing the data are mounted rbrw and are
JFS+ volumes.  Our tape drives are running off of two dedicated 4Gb HBAs
and our backup DB throughput is excellent, averaging 350-400 MB/s.

For those of you that are running TSM dedup, how are you managing your
offsiting process?  Are you using a random devclass pool as a 'landing
zone' for backup/offsite operations before migration to the dedup stg? Are
there tuning parameters that you have tweaked that have shown improvement
in FILE devclass stg pools to LTO devices?

Any and all tips would be appreciated.  I've been through the parameters
listed in the perf guide and have allocated large memory pages to the TSM
server but I haven't seen much, if any, improvement.

Thanks!

Matthew McGeary
Technical Specialist
PotashCorp - Saskatoon
306.933.8921