Backup of big file server > 200TB - client starts to quiet

pkoch

ADSM.ORG Member
Joined
Oct 29, 2003
Messages
7
Reaction score
0
Points
0
Hi,

we are using StorNEXT (Quantum) as GFS here. In the moment we have 8 file systems with a total size ~250GB.
For backup we are using a dedicated stornext client node (access to the file system via san) and the normal tsm client.
So not only the size of the file systems is getting bigger, but also the number of files.
The OS is Novell SLES 10 (x86_64) and the client TIVsm-BA-5.3.6-2. We are using a 10Gbit connection to the server.

After 1 or 2 successful backups the node will fail and we have to restart the scheduler again. It seems that the
number of files in one file system is getting to big.
Normally the backup is done within 9h


How stable are the tsm clients (6.x) and which one might be the best in the moment
regarding performance AND huge number of files ?

Any other hints to improve the performance ?

Thanks and bye,
Peer
 
You can try to break in some jobs we adjust the use of memory to not impact the server. If it was AIX or Windows could use the Journal, but unfortunately it is not possible yet for Solaris.

I hope this helps.
 
Break the Scheduler to take backup of single drive at one time.Hence very less memory will be used ..
 
All,

Currently I am using TSM server 5.4 on AIX 5.3. We have planned for upgrade to TSM 6.3 next year. Now we have end up in a problem while reducing the tape usage.

The issue I am facing may be a repeated one. But here is what I see.
We have a file server(holding 20 year old data) with 3 partions( 270gb, 140 GB and 2 GB) each holding several directories and user folders.

TSM 5.5.3.0 is the client version installed on the file server.
I had set the tcpwindow size to 64 thinking it may allow more space to send data.

Since after binding the node to use the Default managment class(30 days) for DIRMC (earlier it was using the no limit management class for directories) to avoid the node writing directly to new tape. (Collocation is not enabled for this node or for the tape pool). Problem is, before using DIRMC, backup was running fine on this node. Since after the change, daily scheduled backup never completes running for more than 13 hrs and when the node sends around 1.5Gb data, it starts filling out the recovery log and only way to fix it is kill the backup to pave way for other backups or allow the TSM to do Db backup using space trigger. When checked for number of objects stored, this server holds nearly 2 crore files. I am using the VSS snapshot method to perform snapshot backup for OFS. Since DIRMC is now applied, rebinding all files and folders to default mgmt class takes longer time to bind 2 crore files which increases the log.


I had tried performing manual backup of large drive first but it doesn't make it faster due to number of files present.

Is there any way to avoid this by skipping the rebinding of files during backup allowing the backup to run faster and rebind it from client during non backup hour.

Your response is highly appreciated.
 
All,

Currently I am using TSM server 5.4 on AIX 5.3. We have planned for upgrade to TSM 6.3 next year. Now we have end up in a problem while reducing the tape usage.

The issue I am facing may be a repeated one. But here is what I see.
We have a file server(holding 20 year old data) with 3 partions( 270gb, 140 GB and 2 GB) each holding several directories and user folders.

TSM 5.5.3.0 is the client version installed on the file server.
I had set the tcpwindow size to 64 thinking it may allow more space to send data.

Since after binding the node to use the Default managment class(30 days) for DIRMC (earlier it was using the no limit management class for directories) to avoid the node writing directly to new tape. (Collocation is not enabled for this node or for the tape pool). Problem is, before using DIRMC, backup was running fine on this node. Since after the change, daily scheduled backup never completes running for more than 13 hrs and when the node sends around 1.5Gb data, it starts filling out the recovery log and only way to fix it is kill the backup to pave way for other backups or allow the TSM to do Db backup using space trigger. When checked for number of objects stored, this server holds nearly 2 crore files. I am using the VSS snapshot method to perform snapshot backup for OFS. Since DIRMC is now applied, rebinding all files and folders to default mgmt class takes longer time to bind 2 crore files which increases the log.


I had tried performing manual backup of large drive first but it doesn't make it faster due to number of files present.

Is there any way to avoid this by skipping the rebinding of files during backup allowing the backup to run faster and rebind it from client during non backup hour.

Your response is highly appreciated.

Since you changed your management class, rebinding option is somehow a must. Since it seems your client has many folders inside, i would recommend the DIRMC would have the following setting instead.
1) Set the DIRMC retention to longer period instead of 30 days
2) Assign a DIRMC pool which will be onsite (faster restore the TSM only look for this pool for directory related objects, while data go to look for data tape)

I guess your old DIR object has already been expired after 30 days, you may need to split the backup into smaller part of incremental to complete and build a base in TSM DB (say be drive level), otherwise, you may need to consider the use of journal agent to increase performance on full incremental.
 
Thanks for your suggestions SniperKing. I did split the backup of the server to go with drive by drive as changing setting up a new retention for DIRMC doesn't seem to work as it takes longer time binding back to longer retention class than the no limit class.

I did a scheduled back drive by drive, planning to do backup of larger drive on sunday from drive level. Hope it should go well. Will keep you updated with the progress.
 
Back
Top