Migration to Tape Performance problem

tsmchambers

Active Newcomer
Joined
Dec 9, 2009
Messages
10
Reaction score
0
Points
0
Hi there,

I have an unusual performance issue when migrating from a raw disk pool to tape.

Pool contains lots of small files.
Data is around 160gb

At first, the performance is absolutely terrific but towards the end of the run, 25gb to go roughly - the performance tails off to a dribble & cuts the MB/s average down to around 44mb/s.

These 2 are set:
MOVEBatchsize 1000
MOVESizethresh 2048

I wondered if these 2 were somehow causing problems even though they are the recommended settings.

Regards,
Darren
 
not sure if my understanding is correct but if you have tons of little files you're batching them up to 1000 files per batch assuming don't exceed 2048 size limit. This should be faster than default as would batch to 500.

have you dropped back to default does it make any difference? does it always have same slowdown irrespective of what else TSM server is doing? Could it be the physical spindles on disk array are busier later on in backup process - i.e. increased seek time etc?
 
Hi, I have not tried dropping back to the default - I will try that in the next hour.

The disks become much less busy towards the end of the run which confuses me somewhat.

I also see in the act log that the migration process opens & closes the output volume often
10:41:42 ANR0515I Process 10 closed volume etc...
10:41:44 ANR0515I Process 10 opened volume etc...

The disk pool is raw, I am wondering if I should change this to a filesystem to benefit from read-ahead ?

Many thanks,
Darren
 
Hi

What kind of tape drives are you using?
How are your diskpool volumes configured? How many and how big?
What disk subsystem? Are the raw disk volumes striped?
What OS is the TSM server?

Does the migration slow down when it starts reading from certain diskpool volumes?

This kind of information is useful when asking disk related performance questions.


Cheers
 
start your migration process again and use this select to see how many mbs migrates per second...

select process_num, process, substr(char(start_time),1,19) as start_time, substr(char(current_timestamp - start_time),1,10) as "elapsed_time", cast(float(bytes_processed) /1024/1024 as dec(8,2)) as mb, cast((cast(bytes_processed as dec(18,0))/cast((current_timestamp-start_time) seconds as decimal(18,0))) / 1024 / 1024 as dec (18,2)) as "mb/s" from processes
 
How does TSM use disk storage pools.

If i have two 50GB luns attached to an AIX tsm server as a logical volume and I create ten 10GB disk storage pool volumes will it create them contiguously? i.e. the first 10GB volume uses the first 10GB of the 50Gb lun and so on until first lun is full and then move on to next?

How does data then get written to DSP. is it contiguous starting from beigining or does it spread load acroos all tsm volumes?
 
How does TSM use disk storage pools.

If i have two 50GB luns attached to an AIX tsm server as a logical volume and I create ten 10GB disk storage pool volumes will it create them contiguously? i.e. the first 10GB volume uses the first 10GB of the 50Gb lun and so on until first lun is full and then move on to next?

How does data then get written to DSP. is it contiguous starting from beigining or does it spread load acroos all tsm volumes?


In your scenario you would have two 50Gb LUNS allocated to the TSM server. You then put these into the same volume group and then create logical volumes on these luns. If you wanted these logical volumes spread over both disks evenly you could use the INTER-POLICY of maximum, otherwise by default these logical volumes will fill LUN1 first and then move onto LUN2
You then define these logical volumes as raw disk storage pool volumes in TSM.
 
Ok, thanks guys - this is our config:
diskpool volumes are 50gb, there are 7 of them each has it's own raw logical volume - total poolsize is 350gb.
Striping does not appear to increase performance & so is not used.

OS is: AIX 6.1 on a P6 with 8Gb ram

The problem is related to only one particular disk pool.

Subsystem is Hitachi USP-V
 
In your scenario you would have two 50Gb LUNS allocated to the TSM server. You then put these into the same volume group and then create logical volumes on these luns. If you wanted these logical volumes spread over both disks evenly you could use the INTER-POLICY of maximum, otherwise by default these logical volumes will fill LUN1 first and then move onto LUN2
You then define these logical volumes as raw disk storage pool volumes in TSM.

Ok - thanks for your help on this - keeping threads together in case it helps general communitiy but happy to start new thread if required..

so we give our AIX admins 2 luns. They create one LV using inter-policy of minimum to create one LV using both disks completely - i.e. no other LVs on the luns. The pp of the LV are then on both luns contiguously. what i don't know is then how the AIX OS using the pps. will it spread its load over the PPs across both disks?

my thinking lv1 have two luns with 100pps. PPs 1-50 on lun 1 and 51-100 on lun2. will hte os using the pps contiguously or randomly spreading load.
 
I wonder if the problem relates to the data from a certain node. As far as I know migration works node at a time. Is the tape storage pool that you are migrating to collocated by node? If so when the migration slows down get the tape volume label the migration is writing to, and the do "q content <volumename>" to see what nodes data is on it..

The above is a bit of a wild guess but I liked the idea anyway !
 
Ok, like your thinking. How do I find out if the pool has collocation ?
Sorry, I'm weak in TSM :(
 
ok, for the tape pool in question do

q stg <poolname> f=d

and then look for the Collocate? option.. if its set to Node then each client nodes data will be stored on its own set of tapes.
 
Ok - thanks for your help on this - keeping threads together in case it helps general communitiy but happy to start new thread if required..

so we give our AIX admins 2 luns. They create one LV using inter-policy of minimum to create one LV using both disks completely - i.e. no other LVs on the luns. The pp of the LV are then on both luns contiguously. what i don't know is then how the AIX OS using the pps. will it spread its load over the PPs across both disks?

my thinking lv1 have two luns with 100pps. PPs 1-50 on lun 1 and 51-100 on lun2. will hte os using the pps contiguously or randomly spreading load.

I think you should start a new thread..
 
Ok, collocate is set to Group but I do not know what the upshot of this is.
There appears to be data stored on the tape volume from more than one node.
 
Ok, collocate is set to Group but I do not know what the upshot of this is.
There appears to be data stored on the tape volume from more than one node.

1. Nodes that are not in a collocation group (see q collocgroup) will be collocated per node, ie each node will have its own tape.

2. Migration migrates in a certain order - it does the largest filespace in the stgpool first, then the 2nd largest etc.

3. When you get towards the end of your migration, you will be migrating the smallest filespaces/nodes.

4. With collocation on, this means you will be doing a lot more tape mounts
at the end of migration compared to the start, which means you'll get less MB/sec as tape mounts take 1-2 minutes to load a tape.
 
Thank you chaps, very useful info.

I turned off collocation & gained approximately 20Mb/s. I need more to satisfy our requirements though.

Are there any benefits to using a filesystem for the disk pool instead of a raw logical volume ? I am assuming that we will benefit from readahead using a fs which may help.

Thank you,
Darren
 
Morning..

So what sort of speed are you getting overall now for an individual session? Try that SQL statement that was posted earlier in the thread.
What speed do you need?

I have always managed to get the speeds I need through raw devices, having said that where we used to work the diskpools were on jfs2 and did perform well.

Cheers
 
Hi,

We obtain an average of between 56 & 62 Mb/s writing to a single tape drive.
We really need to achieve around 80 Mb/s & then everyone can relax :)
 
Back
Top