backup tooks so long to complete

arjess

ADSM.ORG Member
Joined
Jan 31, 2007
Messages
23
Reaction score
0
Points
0
hi everybody...
i have a problem with one of my customer.They saying that their backup takes almost 3 to 4 days to complete.They do a full backup on saturday and do incremental backup from monday to friday.Their server is windows 2003
and they backing up all the drive from drive C to T and the size is between 300g to 400g.There is no tape library just using lto built in tape drive and the operator manually put in the tape.They say why nt backup takes 12 hours to complete and tsm takes 4-5 day to finish.On tuesday i manually run the backup from command line at 7.00pm and until thursday still running on the first tape and backing up at H drive and has 12 drive more to go.

Has anyone encounterd this before and/or an idea how to fix this?

Thanks
Arjess
 
At the end of the backup, TSM client reports stats about backup, i.e how many files were backed up, how long did it took the backup, transfer rate etc. paste this info here so that we can look and see what is happening.
 
Hope not to late this the info i got from dsmerror.log file.From the file also i noticed the operator forgot to put in the second tape as requested by tsm server.After 5 hours tsm server asking for the 3rd tape and operator managed to put in the tape.After that the backup continue running until 3am and them backup stop

02/09/2007 03:12:53 ANS1809W Session is lost; initializing session reopen procedure.
02/09/2007 03:12:54 ANS1809W Session is lost; initializing session reopen procedure.
02/09/2007 03:13:08 ANS1810E TSM session has been reestablished.
02/09/2007 03:34:11 cuGetFSQryResp: Received rc: -50 from sessRecvVerb
02/09/2007 03:34:12 sessSendVerb: Error sending Verb, rc: -50
02/09/2007 03:34:12 cuFSQry: Received rc: -50 from cuBeginTxn
02/09/2007 03:34:12 fsMigrateName(): received error from cuFSQry() RC=-50
02/09/2007 03:34:12 sessSendVerb: Error sending Verb, rc: -50
02/09/2007 03:34:12 cuFSQry: Received rc: -50 from cuBeginTxn
02/09/2007 03:34:12 fsMigrateName(): received error from cuFSQry() RC=-50
02/09/2007 03:34:12 sessSendVerb: Error sending Verb, rc: -50
02/09/2007 03:34:12 cuFSQry: Received rc: -50 from cuBeginTxn
02/09/2007 03:34:12 fsMigrateName(): received error from cuFSQry() RC=-50 02/09/2007 03:34:12 sessSendVerb: Error sending Verb, rc: -50
02/09/2007 03:34:12 cuFSQry: Received rc: -50 from cuBeginTxn
02/09/2007 03:34:12 fsMigrateName(): received error from cuFSQry() RC=-50
02/09/2007 03:34:12 sessSendVerb: Error sending Verb, rc: -50
02/09/2007 03:34:12 cuFSUpd: Received rc: -50 from cuBeginTxn
02/09/2007 03:34:12 fsIncrDateUpdate: received error from cuFSUpd
02/09/2007 03:34:12 ANS1228E Sending of object '\\coldsvr\k$' failed
02/09/2007 03:34:12 ANS1017E Session rejected: TCP/IP connection failure

02/09/2007 03:34:12 ANS1228E Sending of object '\\coldsvr\l$' failed
02/09/2007 03:34:12 ANS1017E Session rejected: TCP/IP connection failure

02/09/2007 03:34:12 ANS1228E Sending of object '\\coldsvr\m$' failed
02/09/2007 03:34:12 ANS1017E Session rejected: TCP/IP connection failure

02/09/2007 03:34:12 ANS1228E Sending of object '\\coldsvr\n$' failed
02/09/2007 03:34:12 ANS1017E Session rejected: TCP/IP connection failure

02/09/2007 03:34:12 ANS1228E Sending of object '\\coldsvr\o$' failed
02/09/2007 03:34:12 ANS1017E Session rejected: TCP/IP connection failure

And this one from actlog


02/09/2007 03:34:29 ANE4952I (Session: 3482, Node: ONDEMAND) Total number of objects inspected: 555,580 (SESSION: 3482)

02/09/2007 03:34:29 ANE4954I (Session: 3482, Node: ONDEMAND) Total number of objects backed up: 555,508 (SESSION: 3482)

02/09/2007 03:34:29 ANE4958I (Session: 3482, Node: ONDEMAND) Total number of objects updated: 0 (SESSION: 3482)

02/09/2007 03:34:29 ANE4960I (Session: 3482, Node: ONDEMAND) Total number of objects rebound: 0 (SESSION: 3482)

02/09/2007 03:34:29 ANE4957I (Session: 3482, Node: ONDEMAND) Total number of objects deleted: 0 (SESSION: 3482)

02/09/2007 03:34:29 ANE4970I (Session: 3482, Node: ONDEMAND) Total number of objects expired: 0 (SESSION: 3482)

02/09/2007 03:34:29 ANE4959I (Session: 3482, Node: ONDEMAND) Total number of objects failed: 24 (SESSION: 3482)

02/09/2007 03:34:29 ANE4961I (Session: 3482, Node: ONDEMAND) Total number of bytes transferred: 229.03 GB (SESSION: 3482)

02/09/2007 03:34:29 ANE4963I (Session: 3482, Node: ONDEMAND) Data transfer time: 31,265.71 sec (SESSION: 3482)

02/09/2007 03:34:29 ANE4966I (Session: 3482, Node: ONDEMAND) Network data transfer rate: 7,681.18 KB/sec (SESSION: 3482)

02/09/2007 03:34:29 ANE4967I (Session: 3482, Node: ONDEMAND) Aggregate data transfer rate: 1,176.97 KB/sec (SESSION: 3482)

02/09/2007 03:34:29 ANE4968I (Session: 3482, Node: ONDEMAND) Objects compressed by: 0% (SESSION: 3482)

02/09/2007 03:34:29 ANE4964I (Session: 3482, Node: ONDEMAND) Elapsed processing time: 56:40:47 (SESSION: 3482)

Last friday i run backup again using GUI and backing up about 14 harddisk.They insist on backing up C drive and all the db2 and ondemand data/database in D drive.Might be there so many open file need to be backup s thats why the backup is slow
 
Few thing to look at.

1, After the weekend backup, look at the output of 'q node nodename f=d' for this node from TSM server and look for parameters

Bytes Received Last Session: 237.23 M
Bytes Sent Last Session: 31.33 M
Duration of Last Session: 820.96
Pct. Idle Wait Last Session: 83.53
Pct. Comm. Wait Last Session: 10.88
Pct. Media Wait Last Session: 0.00

This will give you some idea of where the client is spending its time during backup and then you can take some steps to fix this.

2. I see messages in the log about session lost and then reestablished connection again. If this is happening too often then you need to fix this as it will add time in backup to complete.

3. See if the compression on the client side can help you. What kind of network interface this machine have. Does the network out rate seems reasonable.

4. Your aggregate transfer rate is very slow, aggregate transfer rate is the overall rate the client achives including media wait, local processing etc. Check if the session is waiting for tape mounts and since this is a manual library nobody is mounting it on time. See if the IDLE wait time from the 'q node f=d' is too high. If this is the case then the machine may be too busy and cnnot process TSM quickly.

5. Monitor CPU and memory usage during the backup and see if they jump to higher values during backup.

Reggards
 
Thanks...for the info.The tsm server,client,db2,ondemand and lto tape drive is inside one box.On friday i turn on the open file support option before run the backup.This morning i call the operator and the said the backup is running on tape no. 3 and i hope today can finish because my target is to make sure the backup is finish.Actually tsm server crash before then we managed to recover it.There is no system architecture design and what so ever document that can help accept only backup manua.This evening i will go there to checked everything and if there any error i'll post it here.I read this statement in the net..can this help

Don't know if this will help, but I thought I'd share what I found when
I was having problems with our Exchange TDP backup performance. In the
dsm.opt file I added -

TCPWindowsize 255
TCPBuffSize 127

My backups went from 800K/sec to 30MB/sec and backup time went from 3
days to 2.6 hours for a total of 290GB of data on a 1Gb network. There
were a couple of other settings that I changed, but this one is what
made it fly. I still think it should be faster...

You may need to adjust the numbers for optimal performance in your
environment.

- David


thank you
 
These parameters will not help you as every thing is one box. What can help you in this case if you use the memory as the communication device b/w the client and the server. Look at the dsm.sys file and check the COMMMethod parameter. If its TCPIP, you can change it to SHAREDMEM i.e

COMMEThod SharedMEM


you also need to enable the sharedmem protocol on the TSM server. So, review the SHAREDMEM and SHMPORT option and then enable it to see if this helps you.

Are you using the TSM B/A client to backup this machine or Are you using DB2 to backup the db2 data (i.e through the TSM API client) ?
 
They using ba client to backup all the disk right now.Last configuration before tsm server crash i don't know.They call us to recover their crach tsm.The person who configure all the things already left the company long time a go with no technology transfer to his collegue.I and my partner do trial and error and luckily nothing bad happen to the server.

Thanks for your suggestion but i'm scared to do lot of thing on this server because we don't know how the on demand,db2 configuration works and this server is a production server.Nevermind this evening i will go to the side and see what happend any problem occured i'll post it here.By the way thanks again
 
Yesterday evening went to customer side and check all the thing.Backup still running and there is no error message in dsmerror.log file.There is some suggestion from my friend to use tsm volume backup method for this scenario.can it help..?
 
I am not sure what do you mean by volume backup ? can you explain please
 
I read in the windadmin53.pdf its said logical volume backup(also call image backup).It can allows backup from an entire file system or raw volume as a single object.But need to configure LVSA(logical volume snapshot agent) to do online backup.
 
Yes, An image backup will allow you to take a full backup of a windows partition and yes you can take it online or offline, but you will face problem in time when you need to restore files or directories. You will not be able to restore individual files from an image.
 
That's the disadvantage of image backup.The customer have so many small and big size file that they want to backup.They insist to backup all file from drive C to T.
 
what are the ways to reduce TSM DB Size

Hi all,
I have TSM with DB usage more that 92%.can any one guide me how to reduce the DB size.

It would be very helpful
 
That's the disadvantage of image backup.The customer have so many small and big size file that they want to backup.They insist to backup all file from drive C to T.

Hello ARJESS I notice your very much looking on the TSM side of this problem but be aware that TSM is not really fond of large WINDOWS servers with lots of files in the FS, it looks like your having 18? volumes to backup so I guess your talking millions of files also? Any possibility of running multiple backup processes in parallel each backing up 2 volumes? Make sure your server has enough CPU and MEMORY to cope with this. Are you sure that open file support works OK, do the files really get backed up or are you trying to backup open files and are they skipped because of this after 4 retries, which doesn't make your backup any faster.
How is your CPU load and MEMORY usage anyway, can't you move 1 or 2 of the applications to an other box? It looks like somebody is bleeding heavely for combining everything in 1 box.
 
Hi all,
I have TSM with DB usage more that 92%.can any one guide me how to reduce the DB size.

It would be very helpful

Justien are you introducing a different problem here? If so please start an other threat because this doesn't help to keep the discussion clear.
But to give you an answer why do you want to shrink your TSM DB, isn't what you're backing up important? if so DELETE it :confused: It looks to me that your just backing up so much that TSM DB needs to grow so give it some more DB volumes to use, how big is your DB anyway?
Remember the more servers/files you backup and the longer you keep the inactive versions the bigger your TSM DB will grow, it's the same like at home the more clothes you buy, the more space you need to store it unless you throwaway what you don't wear anymore so ......
 
Hi hucha...

thanks for the advice..currently the server is IBM Xseries model running on 2 intel xeon cpu 2.80 ghz each and 2gb of ram.Using single lto tape drive to backup.That's why first time we also scared to do trouble shoot on this server because its a production server and seems not stable.I still waiting the info from my customer regarding the backup.After the backup finish than i can check all the error.

I don't know la how they come out with the design.Right know we are proposing something for them...To everyone exspecially alimirza thanks for all the info and guide..it's help me a lot..thanks guys:) :) :)
 
Back
Top