• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

Spectrum Protect Server (for VE instance ) Replication

shcart

ADSM.ORG Member
#1
A little useful advice.

We are using SP (for VE) on a large Blueprint server. Essentially we have one asnode per vcenter (1800-2000 guests).

If you are going to replicate a server built like this DO NOT start backing up to the primary until you have the Replication target set up and ready to go. Protect storage pool works fine though it runs for a long time. However the Replicate node receiving process appears to pin the DB2 log when it starts on the asnode and then takes a only few hours to consume the all of the permitted 256GB of log space (on a 300GB volume). At which point the SP instance tanks (leaving DB2 Active. hmmmm curiouser and curiouser). Will update when we get more information.
 
#2
so as I understand , the replication process pinned the DB2 , and you active log is filled up
first could you export the data first to a temp media then import on the target server just to decrease the first replication headache ,
or replicate fewer filespaces first , until you reache the whole node
second , what is the free space on your active log directory , it is highly recommended to leave around 8 GB or 20% of the whole partition size as a free space after assiging the ACTLOGSIZE space.
so if your ACTLOGZISE is 256GB so the partion that containing that should be at least 264 GB
I hope that may help you
 

shcart

ADSM.ORG Member
#3
Hi Mahmoudkafafi / all

our actlogsize is 256GB the filesystem is 299GB.

I think we worked out the problems over the last few days. Essentially we re-started replicating with very limited sessions (20) and then increased the number of sessions per ASNode over a few days. We are now synchronized. It appears that the active log cleanup process was simply being swamped with 99 sessions per Asnode (198 replication sessions total ).

Overall our recommendation would be start with your replication server in place before taking your first backup. Start with 20 channels per ASNode and Build up your replication channels slowly and keep it reasonable.

We have a P9 server with half of a Dedicated direct attached V7000 including SSD for DB / actlog storage, a dedicated 20 Gb/s replication path. This configuration simply cannot keep up with 99 replication channels for the TSM for VE ASNODE.

Overall everything is now much more stable.
 

marclant

ADSM.ORG Moderator
#4
99 sessions (or channels like you call it) is too much. In the Blueprint, sessions recommended are as follows:
20 sessions for a small
40 sessions for a medium
60 sessions for a large

Check Chapter 2 in the Blueprint, to see if you workload falls in to a small, medium or large. Then in Chapter 3, see if your hardware matches the recommendations. If it does, then you can use the maxsessions recommended, if smaller, you may need to scale back.
 

shcart

ADSM.ORG Member
#5
Thanks Marclant. Hardware matches recommendations with the exception of the storage size (about 50% of recommended size ). Whole configuration was blessed by IBM before original purchase.

TSM for VE servers are getting between 5.75:1 thru 8.2:1 The larger one has almost over 2.2 PB (now 78.5% used) reported on a 1/2 PB Dirpool.

TSM servers are 1.95:1 thru 2:1 ( We have encrypted Oracle which limits dedupe )

I think all 5 of the team must have missed the importance of chapter 2.
 

marclant

ADSM.ORG Moderator
#6
Being over on storage size is not a huge deal, you'll need more storage and more DB space. With a larger DB, DB disk performance will be important too. CPU/memory would be the same as that's processing the workload, not stored data.
 

shcart

ADSM.ORG Member
#7
99 sessions (or channels like you call it) is too much. In the Blueprint, sessions recommended are as follows:
20 sessions for a small
40 sessions for a medium
60 sessions for a large

Check Chapter 2 in the Blueprint, to see if you workload falls in to a small, medium or large. Then in Chapter 3, see if your hardware matches the recommendations. If it does, then you can use the maxsessions recommended, if smaller, you may need to scale back.
-------------------------------------------------------------------------------------------------------------------------

I just reread the blueprint, the S=20,M=40,L=60 are the prepopulated (default) values created by the build and listed in Chapter 5 Table 18. However I did not see any replication limitations or recommendations mentioned anywhere.

We are definitely a Large blueprint with 2+ PB reported and 30-50 TB a night.
As a direct result of you mentioning this, even before we looked at the blueprint, we played with the replication channels.

On the TSM for VE servers we tweaked the replication numbers. Smaller is definitely better when you have only 2 nodes and 3000 plus filespaces. The problem appears to have gone after limiting replication throughput by reducing the sessions.

Looking at the above (slower works) It now seems that log pinning is NOT the issue. By replicating with too many channels we may have been overrunning the DB2 Database Engine's ability to archive (and clean up) its active log files. Will update as we learn.

So thanks MARCLANT
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

UpCloud high performance VPS at $5/month

Get started with $25 in credits on Cloud Servers. You must use link below to receive the credit. Use the promo to get upto 5 month of FREE Linux VPS.

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 17 19.5%
  • Keep using TSM for Spectrum Protect.

    Votes: 53 60.9%
  • Let's be formal and just say Spectrum Protect

    Votes: 10 11.5%
  • Other (please comement)

    Votes: 7 8.0%

Forum statistics

Threads
31,468
Messages
134,117
Members
21,568
Latest member
MESSID
Top