slow migration for diskpool

Rhuobhe · Aug 13, 2019

Hi I am backing up a large file server using SP and with the 'dsmISI mags' product to improve the sessions and speed, the backup runs great but I am having an issue migrating the data from the diskpool to tapepool (4 Gb FC) in a reasonable amount of time. How can i get multiple process to migrate the data in parallel?

For example, my diskpool is 2.5 TB (i can't increase it further) the fileserver is about 30 TB. I have one single 'node name' for the entire fileserver. I have no problem after the initial full backup because everything else is incremental but i am trying to backup a new share that is approx 12 TB. What I am doing is backing it up and filling the diskpool in a matter of an hour or two over 10Gb but then I have to wait the entire day for the migration to empty the diskpool to tape before I can backup another 2.5 TB and start the cycle again until i finish backing the entire share.

it seems that there is only 1 migration process is it because it is only 1 node? Please let me know if anyone has ideas. the tape drive is 4 Gb fiber channel but i honestly dont see it pushing the data as fast as i expect when i look at it with nmon (this is an AIX system).

Thank you i appreciate any advice

moon-buddy · Aug 14, 2019

Increase the number of migration count from the diskpool to the tapepool. See "help update stgpool".

Caveat:

You need multiple tape drives and the data will be scattered across multiple tapes depending on file sizes. This will speed things up if the data has many individual files. If the data is one big contiguous file, then adding more migration sessions is useless..

Also, I am presumming you have individual Fiber channel connections to the tape drives. If you have a single FC to multiple tape drives, that is a bottleneck.

Rhuobhe · Aug 14, 2019

Thank you moon-buudy

diskpool MIGPR was 4. Right now there is one process migrating and it has two tapes mounted. Each one runs at around 100 MB/s to each of the two tape drives per nmon.

I updated it to MIGPR to 6 but I don't see any additional process or tapes mounted. When i manually run a migration on diskpool it says there is already one migration running on that stgpool.

RecoveryOne · Aug 14, 2019

Since this is an AIX , you should look at iostat -p 1 10
-p says to watch tape devices, 1 says refresh every 1 second 10 says 10 cycles.
Nmon uses the fcstat command I think (?) to look at all hba's. So, unless you have dedicated hba's to your tape drives vs your storage your results might be off.

Other issues that could affect tape performance:

Tape drive firmware
Dirty tape heads/tapes
Old tapes, or tapes with high error rates
ISL's if any between your tape device and the server
Can your storage device keep up with the concurrent IO requests?
Is the data dedup/compressed - if so, there's overhead to re-hydrate that data before it's sent to tape
IBM atape.driver version
Size of files (thousands of small files going to tape can really cause performance issues)
Tsm version (8.1.7 or 8.1.8 I think really has some under the hood improvements for tape)
How many tape drives are you trying to drive per HBA and is any other storage on that HBA as moon-buddy posted above.

You cannot run a migration while a migration is in progress of the same storage pool, so you'd have to cancel the process, wait for it to terminate and then kick one off again.
Your source storage pool, you mentioned that its a diskpool, but is it a file device class or is it a true disk device class in terms of Spectrum Protect? If its a file based device class, what I've seen is the file volumes will go to a pending state, and the migration will need to be stopped for space reclamation to delete the old file volumes in order to free up space. At which point you could start up a new migration. I'm unsure if a disk device class functions the same way, sorry.

I have 21 LTO6 tape drives in two libraries driven in a really bad non-optimal way, so I understand the struggle you might be facing. I've struggled for years trying to tune tape drive drive performance and in my enviroment it's come down to choices and what to buy that's out side of my control. I work with what I have and report the finding to management.

Rhuobhe · Aug 14, 2019

Hi I do have dedicated HBA for every 2 tape drives /dev/rmt#. The storage pool also have their own HBA as well they are coming from SAN (IBM Storwize v3700) with 4 FC connections each one is 8 Gb. The drives are relatively new the library is a quantum scalar i500 12 drives LTO 6 and LTO7 the tapes are LTO6.
####
There is an ISL between the storagepool and the server not sure if the problem is here though
####
I will research this more on my end but how can i validate or monitor the IO request on the storage? I am relatively new to administering AIX
####
I dont know the atape.driver versio nwill have to look into this.
####
file size and number could be an issue but i am concerned why there is not more drives/tapes mounted for the migration process.
####
This is TSM 7.1.7 what are the tape improvements in 8.1 that we are missing out on?
####
Each HBA s dual port each port has either a tape device connected or one of the four FC conected to the v3700 SAN

Thank you!!

RecoveryOne · Aug 14, 2019

Ok so right off the bat, the LTO6 drives are 8gb but you are only feeding them data at a max of 4gb, can't be helped but two to two and a half LTO6 drives could saturate a single 8gb link (assuming you were able to drive them at full speed).

ISL's are just one more hop the frame has to travel. If you don't have enough of them between your switches, then your hba to drive rule is out the window. At that point, you are limited by the number of ISL links. So say you have a san switch, and you have 2 HBA's that each support 2 tape drives, but the tape drives are on a different san switch, but there's only one ISL between that switch that supports the tape devices and the switch that holds the TSM server. As you an see, that ISL would be a bottleneck. Same applies for your disk storage as well.

So nmon and iostat are great to look at storage. Take a peek at the blueprints for best practices as far as queue depths and other items. Also get with your san team to look at what the storage unit is doing as far as IO.

lslpp -l | grep -i atape will return your version

# lslpp -l | grep -i atape
Atape.driver 13.0.5.0 COMMITTED IBM AIX Enhanced Tape

As to why more drives/tapes not in use, you'd have to look at actlog to see if the processes terminated successfully or failed. It's possible they finished the chunks of data they were working on, and the others are taking their time. I'm lazy, and just dump the past 24 hours of actlog to a file so I can search though it at will.

I know 8.1.5 has improved tape read performance when using specific IBM LTO and Jaguar tape drives. I thought there were some further enhancements to writes in 8.1.6 or higher. I could be mistaken and perhaps those enhancements were only for the container storage pool types. I thought I watched one of the IBM release note videos. Its been a while....so my apologies if I am mistaken.

As to migration threads if its a disk device class, I do not know. All my stuff is file or the new directory container type, but I would think it would still go by the number of threads specified unless each thread is trying to access the same volume. So again, actlog would be your best bet to see what is going on.

BTW welcome to being an AIX admin, in my opinion, one of the most unforgiving and one of the BEST operating systems there is. The ability to rip every device out and bring it back in on the fly. Playing with the lvm is so nice. And by unforgiving, I mean that AIX will rarely ask you if you are sure you want to do this

Wurdalak · Jan 2, 2021

Rhuobhe said:
####
file size and number could be an issue but i am concerned why there is not more drives/tapes mounted for the migration process.

https://www.ibm.com/support/knowled...ibm.itsm.srv.doc/t_migrate_seq_processes.html

"If files are collocated by group, each process can migrate only one group at a single time"

Check if you are using collocate on nodes...

select count from nodes where COLLOCGROUP_NAME is not null

select NODE_NAME,COLLOCGROUP_NAME from nodes where COLLOCGROUP_NAME is not null

Wurdalak · Jan 2, 2021

Wurdalak said:
https://www.ibm.com/support/knowled...ibm.itsm.srv.doc/t_migrate_seq_processes.html

"If files are collocated by group, each process can migrate only one group at a single time"

Check if you are using collocate on nodes...

select count from nodes where COLLOCGROUP_NAME is not null

select NODE_NAME,COLLOCGROUP_NAME from nodes where COLLOCGROUP_NAME is not null

select STGPOOL_NAME from stgpools where COLLOCATE='GROUP'

slow migration for diskpool

Rhuobhe

moon-buddy

Rhuobhe

RecoveryOne

Rhuobhe

RecoveryOne

Wurdalak

Newcomer

Wurdalak

Newcomer

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics