Number of migration processes drops due to tape drive preemption

alexp36

ADSM.ORG Member
Joined
Jun 14, 2018
Messages
15
Reaction score
0
Points
0
We have our SP server configured to run 6 migration processes. There are 8 tape drives.

What generally happens is that the migration from disk to tape starts with 6 processes, but then gradually loses processes to preemption from backups or restores which need a tape drive.
After while it ends up with only about 2 processes, and eventually only 1.

There is a lot of data to migrate (the disk pool is 17TB), and once it's down to 1 or 2 processes, the migration takes a very long time to complete(24 hours or more), and it often doesn't keep up with the data coming into the disk pool.

Is there some way of forcing a migration to either re-spawn processes as tape drives become available, or automatically exit and restart, once it gets down to only 1 or 2 processes?
 
I know of no way of having it restart, unless you use external scripting to watch your processes. A rough idea could be a ksh script to run 'q pr' and count the number of migrations along with their pid. Case statements if pidCount > $x cancel pr $1, cancel pr $2.
Downside to that, is migration will naturally reduce the number of streams it's using as it finishes up moving data.

Sounds like your best bet is to get more drives, and/or review your schedules and eliminate resource contention.

Just my two cents.
 
I know of no way of having it restart, unless you use external scripting to watch your processes. A rough idea could be a ksh script to run 'q pr' and count the number of migrations along with their pid. Case statements if pidCount > $x cancel pr $1, cancel pr $2.
Downside to that, is migration will naturally reduce the number of streams it's using as it finishes up moving data.

Sounds like your best bet is to get more drives, and/or review your schedules and eliminate resource contention.

Just my two cents.


Thanks, was probably what I was expecting, from what I've read so far. Good point about the downside - like you say, it's not that easy to know if it's down to one migration process because it's nearly finished, or if it's because of contention.
 
Without knowing what to what you are moving, you might be able to look at free space of source pool for a rough gauge as to when a migration process is finishing up. And then just put in some arbitrary water marks in your script that if less than 20% free restart migration process else end.

But that could cause unforeseen issues elsewhere. Not saying it couldn't be done, but then you might get into a situation where migration never catches up, and you'd have to pause items to get cought up before you ran out of stgpool space.

Guess it comes down to whatever is easier for you:
More landing stgpool space, with scripts that take action upon logic you build.
More drives/hba/adapters to drive those tape drives.
Try to tune schedules.

For reference, I just took a peek at one of my migration processes, 4 drives. File stgpool on IBM 5030 to LTO6 tape. 7.1TB or 7.3TB moved from 0622 to 1418 one day, and then the next run took from 0645 to 0132. Just wildly depended on my server load at the time. Interesting thing, the first 3 threads on the 0645 to 0132 run finished in roughly the same time frame as the first run, but the last thread moved roughly equal bytes and files didn't finish until 0132.

I'm not concerned in the processing delay, as I am set in resources at this moment in time.
 
Migration from random disk to tape is done at the node and filespace level. So if a node has much larger daily backups than others, you might see the number of drives slowly go down as there's only a handful of large nodes/filespaces left to process.

Migration from file device class in contrast is done file volume by file volume, so it's easier to keep more concurrent threads going until the end. File pools also perform better than random disk pool because of the larger block size. File pool are sequential and use 256K block size, while random only 4K (I believe, don't quote me on that). If you decide to switch to a file pool, make sure to follow the recommendations here:
 
Migration from random disk to tape is done at the node and filespace level. So if a node has much larger daily backups than others, you might see the number of drives slowly go down as there's only a handful of large nodes/filespaces left to process.

Migration from file device class in contrast is done file volume by file volume, so it's easier to keep more concurrent threads going until the end. File pools also perform better than random disk pool because of the larger block size. File pool are sequential and use 256K block size, while random only 4K (I believe, don't quote me on that). If you decide to switch to a file pool, make sure to follow the recommendations here:

Thanks. I think it is actually a file based pool. The device class which the pool uses has a device type of "file", so assume that's what you mean. Cheers.
 
Back
Top