Best and fastest way to corroborate a backup?

ldmwndletsm · Jun 3, 2020

Not sure if this is the best sub-forum for this question.

How can I have reasonable assurance that a backup for a file space is valid and the primary and copy pools are correct? Obviously, I can restore the data from the primary tape pool and then compare bit for bit against the original data on disk, using hashes and comparing sizes, modtimes and meta-data. Next, I could then change the status of the primary tapes to unavailable (forcing TSM to use the copy pool volumes) and repeat the process (before the copy pool tapes are taken off site) to ensure that the restored data from the copy likewise matches. This is the most authoritative way that I know, but I don't want to do it for most file spaces most of the time since it's very time consuming, clearly.

If I want to avoid restoring the data, what is the next best comparison to expedite this, even if just comparing the file names from a first-time full as a cursory check?

WOULD THIS WORK??? If I compare the numbers reported from `q occupancy node_name file_space_name` between the primary tape pool and the copy pool, that's a start. Next, I run `q nodedata` to determine the tapes that contain the file space for the primary pool, and then I could run 'q content volume node=xxxx file_space_name=xxxx` for each one and concatenate the output to a file. I could then repeat that for the copy pool and compare the two outputs. Finally, I could compare the first output listing against the original data on disk. There would obviously have to be some parsing (e.g. `q content volume` separates the file name from the file space, etc.) and sorting involved. It seems that parsing could be a pain, particularly if using the detailed format option, but a Perl script might make this more bearable. I have found, however, that querying the content of a tape is very slow, even one containing only a few hundred MBs.

Alternatively, I could run a `query backup /filespace/ -subdir=yes` from the client, and maybe parse that and use that to compare with what's on disk, but this doesn't provide any comparison from the copy pool. Also, parsing could be equally aggravating. But this is way faster than a volume query. I guess I could even crawl through the dsmsched.log, but that seems even worse. For example, I've seen cases where a file is reported as having increased in size during compression, so it reports an output line with 'Grew' and another one with 'increased', with the later having the file space separated from the file name, and sometimes there's another file that gets reported on the same line without a line break. There's also all kinds of other superfluous entries, not to mention Retries, etc. Just seems parsing that log file could get really ugly.

Any advice?

marclant · Jun 3, 2020

Best: A yearly DR test

Fastest:
- perform backup of all clients
- after all backups are complete, do backup stgpool of all the primary pools and make sure not to run client backups

After those 2 are done, your offsite pools should be in-sync with the primary.

Not as fast
For the client backup, use auditlogging

For the backup stgpool, compare the occupancy of primary and copy pools.

ldmwndletsm · Jun 3, 2020

marclant said:
Best: A yearly DR test

We're working on that. Bit rot is also something we will periodically check via random restores from off-site tapes. Not full-proof, of course.

marclant said:
Fastest:
- perform backup of all clients
- after all backups are complete, do backup stgpool of all the primary pools and make sure not to run client backups

After those 2 are done, your offsite pools should be in-sync with the primary.

We are doing this.

marclant said:
Not as fast
For the client backup, use auditlogging

I've not used auditlogging. After reading that link, how is that really any different, though, than simply querying the backup from dsmc? With the later, you could also turn on detail, but, of course, this is all done after the fact.

marclant said:
For the backup stgpool, compare the occupancy of primary and copy pools.

I will be doing that now.

marclant · Jun 4, 2020

ldmwndletsm said:
've not used auditlogging. After reading that link, how is that really any different, though, than simply querying the backup from dsmc? With the later, you could also turn on detail, but, of course, this is all done after the fact.

Querying the backup happens after the backup
Auditing happens during the backup and logs files backed, unchanged or excluded.

Best and fastest way to corroborate a backup?

ldmwndletsm

marclant

ldmwndletsm

marclant

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics