Slow Tape to tape performance

dgwhite

Active Newcomer
Joined
Dec 14, 2010
Messages
16
Reaction score
1
Points
0
Location
Calgary, AB Canada
Hi Guys,

I'm new here and new to TSM so please be gentle! We seem to be having issues getting our maintenance plan to finish in time. Many times it takes more than 12 hours. Our TSM infrastructure was delivered by consultants and now I am tasked with looking after it.

Here is a rundown of what we have:

Windows 2008r2 Server (dell r710)
24 Cores
12gb of ram
Dell ML6000 Library (3 Drives LTO4 attached directly to 4GB FC cards in the server)
Disk pools and DB are on Equallogic iSCSI luns
Currently backing up ~65 windows servers (with some TDP agents)
TSM version 6.2.0

The server runs pretty much at Idle 24/7 so I don't think it’s a resource issue.

Here is the maintenance plan we are running:

/* BACKUP_STORAGE_START */
SERIAL
backup stg FILESDPOOL CPBACKUPTPOOL wait=yes
backup stg AGENTDPOOL CPBACKUPTPOOL wait=yes
backup stg METADPOOL CPBACKUPTPOOL wait=yes
backup stg BACKUPTPOOL CPBACKUPTPOOL wait=yes
SERIAL
/* BACKUP_STORAGE_END */
/* BACKUP_DB_START */
backup db devclass=DEVCML6000LTO4 type=full wait=yes
/* BACKUP_DB_END */
/* CREATE_RPF_START */
/* RPF:SOURCE=dbbackup; */
prepare source=dbbackup wait=YES
/* CREATE_RPF_END */
/* MIG_STG_START */
PARALLEL
migrate stgpool AGENTDPOOL lowmig=0 wait=yes
migrate stgpool FILESDPOOL lowmig=0 wait=yes
SERIAL
/* MIG_STG_END */
/* EXP_INV_START */
/* EXP_INV:SKIPDIRS=NO;DURATION=120; */
expire inventory skipdirs=NO wait=yes duration=120
/* EXP_INV_END */
/* Clean up vol history, backup devconfig and volhist*/
backup devconfig
backup volhist
/* Clean up vol history END */
/* RECL_STG_START */
SERIAL
reclaim stgpool BACKUPTPOOL threshold=70 duration=300 wait=yes reclaim stgpool CPBACKUPTPOOL threshold=70 duration=300 wait=yes
/* RECL_STG_END */

Here is my dsmserv.opt:
COMMmethod TCPIP
TCPPort 1500
TCPWindowsize 63
TCPNODELAY Yes
TCPADMINPort 1500
ADMINONClientPort Yes
COMMmethod NAMEDPIPE
NAMEdpipename \\.\pipe\Server1
NPBUFFERSIZE 8
SECUREPipes No
ADSMGROUPname adsmserver
NPAUDITSuccess No
NPAUDITFailure No
MSGINTerval 1
MAXSessions 100
COMMTimeout 60
IDLETimeout 15
TXNGroupmax 4096
DATEformat 1
TIMEformat 1
NUMberformat 1
MESsageformat 1
LANGuage AMENG
EXPInterval 24
EXPQUiet Yes
MOVEBatchsize 1000
MOVESizethresh 2048
VOLUMEHistory "volhist.out"
DEVCONFig "devcnfg.out"
RESTOREINTerval 1440
DISABLESCHEDS No
EVENTSERVER Yes
REQSYS Yes
ENABLE3590 Yes
3494SHARED Yes
ASSISTVCRRECovery Yes
QUERYAuth NONE
ADREGISTER No
ADUNREGISTER No
MIRRORLOGDirectory I:\tsm\logmirror
ARCHFAILOVERLOGDirectory H:\tsm\archlogfailover
ACTIVELOGDirectory F:\tsm\log
ARCHLOGDirectory G:\tsm\archlog
DBMTRUSTEDGUIDIGNORE YES
COMMTIMEOUT 14400
MAXS 100
ACTIVELOGSIZE 10240

Does anything jump out at you with my setup? When writing large files, I am getting approximately 1 gig/min writing directly to tape.

Any help would be appreciated.

Thanks,

Dan
 
Run the copy of the diskpools at the same time. No reason you couldn't unless you are using 3 processes for every backup pool. Same with the migration...You didn't supply the diskpool sizes so I am shooting from the hip here, but I would only use the wait=yes on the backups of the tapes pools to the copy pools.
 
My disk pools are prety big:

Storage Device Estimated Pct Pct High Low Next Stora-
Pool Name Class Name Capacity Util Migr Mig Mig ge Pool
Pct Pct
----------- ---------- ---------- ----- ----- ---- --- -----------
AGENTDPOOL DISK 1,600 G 32.3 32.3 90 70 BACKUPTPOOL
ARCHIVEDPO- DISK 300 G 10.8 10.8 90 70 ARCHIVETPO-
OL OL
ARCHIVETPO- DEVCML600- 136,415 G 4.9 12.0 100 70
OL 0LTO4
BACKUPTPOOL DEVCML600- 130,389 G 24.2 36.0 100 70
0LTO4
CLIENT_AUT- FILE_CLIE- 0.0 M 0.0 100.0 90 70
ODEPLOY NT_DEPLO-
Y_DEV_1
CPBACKUPTP- DEVCML600- 12,739,154 0.2
OOL 0LTO4 G
DISKPOOL DISK 4.0 M 0.0 0.0 90 70
FILESDPOOL DISK 2,000 G 32.4 32.4 90 70 BACKUPTPOOL
METADPOOL DISK 10 G 0.0 0.0 90 70 BACKUPTPOOL
MONTHLYAGE- DISK 0.0 M 0.0 0.0 90 70 ARCHIVEDPO-
NTDPOOL OL
MONTHLYMET- DISK 10 G 0.0 0.0 90 70 ARCHIVETPO-
ADPOOL OL
SPACEMGPOOL DISK 0.0 M 0.0 0.0 90 70

So you would change the stg backups as follows?:

backup stg FILESDPOOL CPBACKUPTPOOL wait=n
backup stg AGENTDPOOL CPBACKUPTPOOL wait=n
backup stg METADPOOL CPBACKUPTPOOL wait=n
backup stg BACKUPTPOOL CPBACKUPTPOOL wait=yes

I guess that way TSM can use all 3 drives for writing during the disk backup. Would the final script looks like this with your suggested changes?:

/* BACKUP_STORAGE_START */
SERIAL
backup stg FILESDPOOL CPBACKUPTPOOL wait=no
backup stg AGENTDPOOL CPBACKUPTPOOL wait=no
backup stg METADPOOL CPBACKUPTPOOL wait=no
backup stg BACKUPTPOOL CPBACKUPTPOOL wait=yes
SERIAL
/* BACKUP_STORAGE_END */
/* BACKUP_DB_START */
backup db devclass=DEVCML6000LTO4 type=full wait=yes
/* BACKUP_DB_END */
/* CREATE_RPF_START */
/* RPF:SOURCE=dbbackup; */
prepare source=dbbackup wait=YES
/* CREATE_RPF_END */
/* MIG_STG_START */
PARALLEL
migrate stgpool AGENTDPOOL lowmig=0 wait=no
migrate stgpool FILESDPOOL lowmig=0 wait=no
SERIAL
/* MIG_STG_END */
/* EXP_INV_START */
/* EXP_INV:SKIPDIRS=NO;DURATION=120; */
expire inventory skipdirs=NO wait=yes duration=120
/* EXP_INV_END */
/* Clean up vol history, backup devconfig and volhist*/
backup devconfig
backup volhist
/* Clean up vol history END */
/* RECL_STG_START */
SERIAL
reclaim stgpool BACKUPTPOOL threshold=70 duration=300 wait=no
reclaim stgpool CPBACKUPTPOOL threshold=70 duration=300 wait=no
/* RECL_STG_END */

Thanks again for your help,

Dan
 
So, what kind of throughput are you actually getting? (You can get this information from the summary table.) We need this number to determine if there actually is a problem. If performance is what is expected, then there is really no 'problem'; you just need more drives!

How many HBA ports are you using? Are you using different HBAs for the drives and disk? (I hope so -- this would kill performance!)
 
Disk is iSCSI with dedicated links (iSCSI is a dedicated network here) We have 2 x 1 GB network links for storage. I believe this is the problem, I just can’t fix that right now. There are individual HBA's for each tape drive. In theory, tape to tape should be pretty fast.

How would I go about getting the information you need out of the summary table? Again sorry for being a Newb.

Thx,

Dan
 
select activity,entity, (bytes/cast(timestampdiff(2,char(end_time-start_time)) as decimal))/1024/1024 as MBpSec from summary where activity in ('MIGRATION','COPY STGPOOL')

Note: I'm not using copy pools, and I can't remember the exact activity string for the copy stgpool activity. Just do "select distinct activity from summary" to get the list of activities you use.
 
Some of the DB queries produce errors:

tsm: PBNTSM02>select activity,entity, (bytes/cast(timestampdiff(2,char(end_time-start_time)) as decimal))/1024/1024 as MBpSec from summary where activity in ('STGPOOL BACKUP')
ANS1017E Session rejected: TCP/IP connection failure
ANS8001I Return code -50.
ANS8064E Communication timeout. Reissue the command.
Session established with server PBNTSM02: Windows
Server Version 6, Release 2, Level 1.1
Server date/time: 01/13/2011 13:37:49 Last access: 01/13/2011 13:30:13
ANS8001I Return code -50.

One that works is 'FULLDB_BACKUP:

ACTIVITY: FULL_DBBACKUP
ENTITY:
MBPSEC: 35.622364420782
ACTIVITY: FULL_DBBACKUP
ENTITY:
MBPSEC: 42.507058812467
ACTIVITY: FULL_DBBACKUP
ENTITY:
MBPSEC: 11.867281567435
ACTIVITY: FULL_DBBACKUP
ENTITY:
MBPSEC: 12.889234336619
ACTIVITY: FULL_DBBACKUP
ENTITY:
MBPSEC: 12.263265228844
ACTIVITY: FULL_DBBACKUP
ENTITY:
MBPSEC: 12.593924917514

Does that help any?
 
You may get those errors when the admin session has been sitting idle for a while, and communications has to be re-established.

Interesting how you go from 35+MB/sec (decent) to <13MB/sec (sucks). Display the start_time as well, and find out what's changed between good and bad days.

You may have to monitor system resources as things are happening: q-depth, memory, paging.
 
Here is the results form Mid December:

ACTIVITY: FULL_DBBACKUP
MBPSEC: 33.260815193927
START_TIME: 2010-12-15 15:06:05.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 35.469247005064
START_TIME: 2010-12-16 13:55:04.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 33.523798174137
START_TIME: 2010-12-17 15:15:30.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 32.461253137178
START_TIME: 2010-12-20 11:06:33.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 29.403603969707
START_TIME: 2010-12-22 18:19:22.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 25.293549252652
START_TIME: 2010-12-24 17:54:32.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 23.988392222127
START_TIME: 2010-12-29 06:15:31.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 47.399579519782
START_TIME: 2010-12-31 11:04:58.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 45.813062486434
START_TIME: 2011-01-01 12:21:50.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 45.755599500633
START_TIME: 2011-01-02 11:00:32.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 47.265341035296
START_TIME: 2011-01-03 12:23:27.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 40.123565264787
START_TIME: 2011-01-05 15:27:03.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 41.166820157631
START_TIME: 2011-01-05 15:41:24.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 23.516867419083
START_TIME: 2011-01-05 16:19:37.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 24.967054524488
START_TIME: 2011-01-05 16:49:41.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 23.675955967666
START_TIME: 2011-01-05 17:19:47.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 24.356788633882
START_TIME: 2011-01-05 17:49:55.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 23.977560502286
START_TIME: 2011-01-05 18:19:58.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 22.100647742812
START_TIME: 2011-01-05 18:50:06.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 21.662591425659
START_TIME: 2011-01-05 19:20:13.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 10.826911213757
START_TIME: 2011-01-06 21:46:09.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 11.814863308160
START_TIME: 2011-01-07 17:39:06.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 11.533638022607
START_TIME: 2011-01-08 22:43:05.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 11.609031586180
START_TIME: 2011-01-11 00:32:39.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 10.977270951045
START_TIME: 2011-01-11 12:17:43.000000
ACTIVITY: FULL_DBBACKUP
MBPSEC: 12.383580543862
 
So, around 2011-01-05 your performance starts to tank. That's about all I can say from this. You'll have to determine what was changed at that time.

Look at your (Windoze) error logs around then as well and see if you can correlate system errors.

Good luck.
 
Back
Top