Hi,
Just because i think it is a very important update to let all know .
Container pools do Dedup Tier Levels and the minium Tier Level is 50kb see the new update node command for 8.1.12 :
MINIMUMExtentsize :
Specifies the extent size that is used during data deduplication operations for cloud-container storage pools and directory-container storage pools on this node. In most system environments, the default value of 50 KB is appropriate. However, if you plan to deduplicate data from an Oracle or SAP database, and the average extent size is less than 100 KB, you can help optimize performance by specifying a larger extent size. Data in Oracle and SAP databases is typically deduplicated with extent sizes that are much smaller than the default average size of 256 KB. Small extent sizes can negatively affect the performance of backup and expiration operations and can result in unnecessary growth of the IBM Spectrum Protect server database.
We found out this is a big problem if you are going for TDP for VE. Because all TDP Data is on 50KB ( Because the Server see's only Blocks in 1MB Sizes, and the Client Developer do not talk to the Server Developer) .
So if you store like 100TB of Data on VM Backup's it is getting very ugly to copy/restore them from Tape. Even the Replication is slow in that case. We found that because we use multible Containers and wonder why the performance is so different .
This is of course if you do Blue Print configs with big SATA Drives which gives u around 70 IO/s . So in this case we had about 12000 IOPS in total ( if all Disks are uses in parallel on all LUN's ) which does not work because the SW it is optimized on filling used Containers first ( space vs performance ) .
We have changed the TDP DC Nodes to 250kb and see big improvements in protect and replication performance . We have more VMWARE STG Pool Usage now (+12% ) and the DB Usage did decrease a lot which is also cool. Keep in mind all chunks will be "new" because of the size. So it depends a lot on the copygroup and in case of VMWare i think the last FULL. I expect that the STGPOOL will lower down a bit more, but it went up to +25% strait away as we have done bulk full backups .
Also take in mind if you have copy container pool and want to restore them in case of a disaster. I have seen the restore of a container spreads good to all LUN's attached, but it does help very little if it restores them in 50kb chunks . I have done some real and test disaster restores of Containers and it was always slow if we compared what the Tape Drive could deliver and we thought just stream that sh.t to Disk. No it does not work that way ... I have seen this also on File Level (File Server ) nodes and increased the extend size there as well.
There is a SQL which gives the average size of the junks for a container pool. With this Info + your Backend IOPS it could give a little guess now long it will take to restore . Of course with all other optimium's like DB2 on NVMe , a lot of Cores with high frequency .
db2 "select avg(cast(length as bigint)) from sd_chunk_locations where poolid=XXXXX for read only with ur"
( takes some time to finish, output is in bytes )
poolid = XXX ( show sdpool )
The bottom is of course dedup is not as good anymore and therefore more pool space . But the more Data is much better to process / handle.
In our case having different Pools it was good to check . I have got no Info from IBM Support how to query the average Extend Size used per Node .
This is also a good point if you have an extra large DB2 . Just changing TDP DC Nodes which holds appr 30% of all Data stored reduced the Size from around 275.000.000 to 244.906.466 Pages. ( No problem to handle 3TB+ , just this example ) . The DB2 did also increase to 282.000.000 Pages temp.
Just image 5 time less pointers in the DB2 for TDP Data ( 250 vs 50 ) , and bigger Backup Data chunks to process. I think this is a must have to get a good performing ISP Installation .
Hope this help others.
Br, Dietmar
D&C IT Consulting
Just because i think it is a very important update to let all know .
Container pools do Dedup Tier Levels and the minium Tier Level is 50kb see the new update node command for 8.1.12 :
MINIMUMExtentsize :
Specifies the extent size that is used during data deduplication operations for cloud-container storage pools and directory-container storage pools on this node. In most system environments, the default value of 50 KB is appropriate. However, if you plan to deduplicate data from an Oracle or SAP database, and the average extent size is less than 100 KB, you can help optimize performance by specifying a larger extent size. Data in Oracle and SAP databases is typically deduplicated with extent sizes that are much smaller than the default average size of 256 KB. Small extent sizes can negatively affect the performance of backup and expiration operations and can result in unnecessary growth of the IBM Spectrum Protect server database.
We found out this is a big problem if you are going for TDP for VE. Because all TDP Data is on 50KB ( Because the Server see's only Blocks in 1MB Sizes, and the Client Developer do not talk to the Server Developer) .
So if you store like 100TB of Data on VM Backup's it is getting very ugly to copy/restore them from Tape. Even the Replication is slow in that case. We found that because we use multible Containers and wonder why the performance is so different .
This is of course if you do Blue Print configs with big SATA Drives which gives u around 70 IO/s . So in this case we had about 12000 IOPS in total ( if all Disks are uses in parallel on all LUN's ) which does not work because the SW it is optimized on filling used Containers first ( space vs performance ) .
We have changed the TDP DC Nodes to 250kb and see big improvements in protect and replication performance . We have more VMWARE STG Pool Usage now (+12% ) and the DB Usage did decrease a lot which is also cool. Keep in mind all chunks will be "new" because of the size. So it depends a lot on the copygroup and in case of VMWare i think the last FULL. I expect that the STGPOOL will lower down a bit more, but it went up to +25% strait away as we have done bulk full backups .
Also take in mind if you have copy container pool and want to restore them in case of a disaster. I have seen the restore of a container spreads good to all LUN's attached, but it does help very little if it restores them in 50kb chunks . I have done some real and test disaster restores of Containers and it was always slow if we compared what the Tape Drive could deliver and we thought just stream that sh.t to Disk. No it does not work that way ... I have seen this also on File Level (File Server ) nodes and increased the extend size there as well.
There is a SQL which gives the average size of the junks for a container pool. With this Info + your Backend IOPS it could give a little guess now long it will take to restore . Of course with all other optimium's like DB2 on NVMe , a lot of Cores with high frequency .
db2 "select avg(cast(length as bigint)) from sd_chunk_locations where poolid=XXXXX for read only with ur"
( takes some time to finish, output is in bytes )
poolid = XXX ( show sdpool )
The bottom is of course dedup is not as good anymore and therefore more pool space . But the more Data is much better to process / handle.
In our case having different Pools it was good to check . I have got no Info from IBM Support how to query the average Extend Size used per Node .
This is also a good point if you have an extra large DB2 . Just changing TDP DC Nodes which holds appr 30% of all Data stored reduced the Size from around 275.000.000 to 244.906.466 Pages. ( No problem to handle 3TB+ , just this example ) . The DB2 did also increase to 282.000.000 Pages temp.
Just image 5 time less pointers in the DB2 for TDP Data ( 250 vs 50 ) , and bigger Backup Data chunks to process. I think this is a must have to get a good performing ISP Installation .
Hope this help others.
Br, Dietmar
D&C IT Consulting