How often to run audit container on the whole storage pool?

RecoveryOne · Sep 30, 2020

Was wondering what everyone's thoughts are on how often you should run audit container? I've been given advice about running an audit container process from a stgrule every 7 days. I feel that is a bit aggressive, especially when you have over 18000 containers.

I cannot recall a time in my environment where I've had damaged extents in my directory container pools, however I'd like to be notified of such damage before I need that extent. Short of data moment operations or attempting to restore that extent, there's no other way I'm aware of to find damage.

Thanks!

AngeloDeAngelis · Sep 30, 2020

I believe we are working on some documentation on best practices on containers auditing.

Most likely it will be available on the Support Supplemental Information Page here:
https://www.ibm.com/support/pages/node/3608691

I'll dig a little here but once a week scheduling the process for a few hours is a good thing.
Doing the Replica Server's container storage when using replication would be a very good idea.

RecoveryOne · Sep 30, 2020

Sadly no replication in place. Running protect type=local to LTO media. Outside of best practices, I am well aware, but what I have to work with.

Even if running once a week and running for a few hours, with any significant amount of containers it would still be some time before it made it through the entire pool.

As to scheduling, would you think it would be from a stgrule or better done from admin task? I am under the impression (not sure if mistaken) that stgrules take lower priority than admin tasks so it could potentially put less strain on the system.

Thanks again!

AngeloDeAngelis · Sep 30, 2020

I going to check if we would even copy a bad block coming out of a container to a copy pool on tape. Could drive an ANR

RecoveryOne · Sep 30, 2020

My guess would be ANR3660E or ANR4847W. Not 100% sure.
I would assume the protect process would only flag on extents being copied to tape during normal protect and then during a reclaim operation. Some tapes (and extents on disk) might not be touched for some time as the data could be fairly static.

So yeah, any guidance on how often to run an audit container, how many processes, and which method for the whole pool would be helpful.

AngeloDeAngelis · Oct 1, 2020

Working on some additional information on this topic.

Some initial feed-back:
if we are trying to read an extent and it was corrupt (or if the entire container was corrupt), we would detect it and raise an error message/alert. If an extent is corrupted, but is not needed to be copied during this process (because it had earlier been copied) we would not likely know. If we detect corruption and a protect stgpool relationship exists on another server, we can repair from that source. We have auditing of container pools able to be triggered as a storage rule, so containers can be audited and corruption detected as an ongoing process.

RecoveryOne · Oct 1, 2020

AngeloDeAngelis said:
We have auditing of container pools able to be triggered as a storage rule, so containers can be audited and corruption detected as an ongoing process.

Yep, which goes right back to my initial question/advice of using a stgrule to audit containers and the parameters there of.

How often to run audit container on the whole storage pool?

RecoveryOne

AngeloDeAngelis

RecoveryOne

AngeloDeAngelis

RecoveryOne

AngeloDeAngelis

RecoveryOne

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics