Spectrum Protect Server (for VE instance ) Replication

shcart · Aug 9, 2019

A little useful advice.

We are using SP (for VE) on a large Blueprint server. Essentially we have one asnode per vcenter (1800-2000 guests).

If you are going to replicate a server built like this DO NOT start backing up to the primary until you have the Replication target set up and ready to go. Protect storage pool works fine though it runs for a long time. However the Replicate node receiving process appears to pin the DB2 log when it starts on the asnode and then takes a only few hours to consume the all of the permitted 256GB of log space (on a 300GB volume). At which point the SP instance tanks (leaving DB2 Active. hmmmm curiouser and curiouser). Will update when we get more information.

mahmoudkafafi · Aug 21, 2019

so as I understand , the replication process pinned the DB2 , and you active log is filled up
first could you export the data first to a temp media then import on the target server just to decrease the first replication headache ,
or replicate fewer filespaces first , until you reache the whole node
second , what is the free space on your active log directory , it is highly recommended to leave around 8 GB or 20% of the whole partition size as a free space after assiging the ACTLOGSIZE space.
so if your ACTLOGZISE is 256GB so the partion that containing that should be at least 264 GB
I hope that may help you

shcart · Aug 21, 2019

Hi Mahmoudkafafi / all

our actlogsize is 256GB the filesystem is 299GB.

I think we worked out the problems over the last few days. Essentially we re-started replicating with very limited sessions (20) and then increased the number of sessions per ASNode over a few days. We are now synchronized. It appears that the active log cleanup process was simply being swamped with 99 sessions per Asnode (198 replication sessions total ).

Overall our recommendation would be start with your replication server in place before taking your first backup. Start with 20 channels per ASNode and Build up your replication channels slowly and keep it reasonable.

We have a P9 server with half of a Dedicated direct attached V7000 including SSD for DB / actlog storage, a dedicated 20 Gb/s replication path. This configuration simply cannot keep up with 99 replication channels for the TSM for VE ASNODE.

Overall everything is now much more stable.

marclant · Aug 21, 2019

99 sessions (or channels like you call it) is too much. In the Blueprint, sessions recommended are as follows:
20 sessions for a small
40 sessions for a medium
60 sessions for a large

Check Chapter 2 in the Blueprint, to see if you workload falls in to a small, medium or large. Then in Chapter 3, see if your hardware matches the recommendations. If it does, then you can use the maxsessions recommended, if smaller, you may need to scale back.

shcart · Aug 21, 2019

Thanks Marclant. Hardware matches recommendations with the exception of the storage size (about 50% of recommended size ). Whole configuration was blessed by IBM before original purchase.

TSM for VE servers are getting between 5.75:1 thru 8.2:1 The larger one has almost over 2.2 PB (now 78.5% used) reported on a 1/2 PB Dirpool.

TSM servers are 1.95:1 thru 2:1 ( We have encrypted Oracle which limits dedupe )

I think all 5 of the team must have missed the importance of chapter 2.

marclant · Aug 21, 2019

Being over on storage size is not a huge deal, you'll need more storage and more DB space. With a larger DB, DB disk performance will be important too. CPU/memory would be the same as that's processing the workload, not stored data.

shcart · Aug 23, 2019

marclant said:
99 sessions (or channels like you call it) is too much. In the Blueprint, sessions recommended are as follows:
20 sessions for a small
40 sessions for a medium
60 sessions for a large

Check Chapter 2 in the Blueprint, to see if you workload falls in to a small, medium or large. Then in Chapter 3, see if your hardware matches the recommendations. If it does, then you can use the maxsessions recommended, if smaller, you may need to scale back.

-------------------------------------------------------------------------------------------------------------------------

I just reread the blueprint, the S=20,M=40,L=60 are the prepopulated (default) values created by the build and listed in Chapter 5 Table 18. However I did not see any replication limitations or recommendations mentioned anywhere.

We are definitely a Large blueprint with 2+ PB reported and 30-50 TB a night.
As a direct result of you mentioning this, even before we looked at the blueprint, we played with the replication channels.

On the TSM for VE servers we tweaked the replication numbers. Smaller is definitely better when you have only 2 nodes and 3000 plus filespaces. The problem appears to have gone after limiting replication throughput by reducing the sessions.

Looking at the above (slower works) It now seems that log pinning is NOT the issue. By replicating with too many channels we may have been overrunning the DB2 Database Engine's ability to archive (and clean up) its active log files. Will update as we learn.

So thanks MARCLANT

Spectrum Protect Server (for VE instance ) Replication

shcart

mahmoudkafafi

shcart

marclant

shcart

marclant

shcart

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics