rowl
ADSM.ORG Senior Member
I have an environment where my TSM database is replicated using disk array based replication. The lag time is less than 5 minutes. My backup data resides on a Data Domain array (using nfs) which replicates independently of the DB, and can lag by hours depending on load. For this environment I have a 24 - 48 hour RPO.
There is an exposure here that I am trying to figure out a solution for and was curious if anyone has run across this problem and solved it. I see a potential for data loss in the event a reclaim process runs and replication is lagging by many hours. I think it is possible that during a reclaim, data from the volume being reclaimed is written to a new volume, the DB replication will assure that this change is replicated to my DR site, but the data itself may not get replicated for several hours. During this window if my primary site fails, my DR site may not have recieved the volume that was created during reclaim, but will have updated DB showing the original volume was deleted and new volume created.
I thought of using a copy pool to cover this, assuring the primary and copy pools are never reclaimed on the same day. That puts a lot of additional load on the Data Domain array and makes my replication lag problem even worse.
Thoughts?
Thanks,
-Rowl
There is an exposure here that I am trying to figure out a solution for and was curious if anyone has run across this problem and solved it. I see a potential for data loss in the event a reclaim process runs and replication is lagging by many hours. I think it is possible that during a reclaim, data from the volume being reclaimed is written to a new volume, the DB replication will assure that this change is replicated to my DR site, but the data itself may not get replicated for several hours. During this window if my primary site fails, my DR site may not have recieved the volume that was created during reclaim, but will have updated DB showing the original volume was deleted and new volume created.
I thought of using a copy pool to cover this, assuring the primary and copy pools are never reclaimed on the same day. That puts a lot of additional load on the Data Domain array and makes my replication lag problem even worse.
Thoughts?
Thanks,
-Rowl