Audit volume fix parameter?

ldmwndletsm

ADSM.ORG Senior Member
Joined
Oct 30, 2019
Messages
232
Reaction score
5
Points
0
PREDATAR Control23

Five questions on audit volume with fix=no versus yes.

I don't understand what the real difference between fix=no and fix=yes is and why you use one versus the other? Fix=yes deletes from the database, whereas fix=no marks as deleted, but after running an audit (fix=no), wherein it reported damaged files on two copy pool tapes, the next morning when our admin script (`backup stgpool primary_pool copy_pool`) ran, those files were copied to another copy pool tape and are no longer reported on the original two. NOTE: None of the primary pool tapes was reported with any damaged files.

According to the IBM documentation for 'backup stgpool':


"If a file already exists in the copy storage pool, the file is not backed up unless the copy of the file in the copy storage pool is marked as damaged."

So it looks like this is exactly what it did.

Q 1. So it doesn't appear that it was necessary for me to use fix=yes?

Q 2. Is there any reason for me to do this now, after the fact?

It also says this for the copy storage pool:

" Fix=No The server reports the error and marks the physical file copy as damaged in the database.
Fix=Yes The server deletes any references to the physical file and any database records that point to a physical file that does not exist."

But here's where I get really confused. The documentation for query content says this:

"Damaged: Yes Specifies that only files that are marked as damaged are displayed. These are files in which the server found errors when a user attempted to restore, retrieve, or recall the file, or when an AUDIT VOLUME command was run. "

But when I run `q content volume damaged=yes` on both of the damaged copy pool tapes, it reports thus:

ANR2034E QUERY CONTENT: No match found using this criteria.
ANS8001I Return code 11.

Q 3. If I didn't use 'fix=yes' for the audit then why would it not report the damaged files for these tapes?

Maybe it would have if I'd done it before the admin script (`backup stgpool primary_pool copy_pool`) ran, but once the damaged files are copied to another copy pool tape then it's too late?

Q 4. When audit with fix=no reports damaged files, and you rerun the audit, and no damaged files are found, then the files that were previously marked damaged in the database are now unmarked?

Specifically, I ran another audit (fix=no) on the two tapes, as discussed below, in a different drive, but this was after the admin script ran, and the files inspected were less by the number of damaged files. Would they have been the original values (8003349, 25143762) if the admin script had not run yet, assuming the tapes were okay?

Q 5. If you use fix=no for a copy pool volume, and damaged files are reported then what happens when you run `backup stgpool primary_pool copy_pool`? Does this reset the damaged files when it copies those from the primary to another or "new" copy pool tape?


[ background ]
I ran an audit (fix=no) on a bunch of copy pool tape volumes. Two of them reported damaged files: one with 12 (files inspected=8003349) and the other with 13039052 (files inspected=25143762). Both of these occurred on tape drive 3. The activity log reported some other problems with media in drive 3. I was suspicious of the drive, so the next day, I took that drive offline and reran the audit (fix=no) on both of those volumes using a different drive. It reported thus:

Message: ANR4133I Audit volume process ended for volume B00530L6; 8003337 files inspected, 0 damaged files found and marked as damaged, 0 files previously marked as damaged reset to undamaged, 0 objects need updating. (SESSION: 301242, PROCESS: 3306)

Message: ANR4133I Audit volume process ended for volume B00628L6; 12104715 files inspected, 0 damaged files found and marked as damaged, 5 files previously marked as damaged reset to undamaged, 0 objects need updating. (SESSION: 301943, PROCESS: 3307)

These numbers concur with what I would expect when you subtract the initial files inspected from the damaged files. I also ran 'query content volume' and added up the number of files reported, and these also match these new totals.

The admin script that does the backup stgpool ran before I ran the second audit, so I guess it must have copied the damaged files from the primary pool tapes to the new copy pool tape because when I ran 'show damaged copypool' it reported nothing. Also, when I ran 'query volume damaged=yes', it reported nothing. As a test, I then picked the first and last damaged files from each of the two tapes, and I hunted through the database (using the object_id, show bfo object_id, bfo super-bitfile method) to determine the primary and copy pool volume where these files reside. Both reported the same "new" copy pool tape B00784L6, not B00530L6 or B00628L6. I then changed the access on the primary pool volumes to unavailable (to force TSM to use the copy pool) and ran a restore of these files. Volume B00784L6 was loaded, and the files concur with what's on disk.

Does this seem correct?
 
PREDATAR Control23

On a primary pool when the data exists in a copy pool, FIX=YES or FIX=NO makes no difference. In either case, the file is not deleted from the primary pool because a good copy exists in the copy pool.

On a primary pool when the data doesn't exists in a copy pool, FIX=YES deletes the damaged copy, FIX=NO just marks it damaged.

On a copy pool, FIX=YES deletes the damaged copy, FIX=NO just marks it damaged. In either case, the next backup stgpool will send a new copy offsite. In the case of FIX=NO, the damaged copy is deleted once replaced with a good copy that would now be on a new offsite tape.
 
PREDATAR Control23

Ah, thank you. :)

Both tapes had a status of FULL before the next backup stgpool was run, and both were at 100 PCT_UTILIZED. I now see that the one with 12 damaged files reported is at 99.5, and the other with 13039052 damaged files is at 49.8, but the status is still FULL for both.

o Is this because when the damaged files are deleted, once replaced with a good copy, the PCT_UTILIZED is recalculated for the volume to factor in the deleted copies?

So, for the second tape, from the first audit, I get 25143762 files inspected - 13039052 damaged files = 12104710. But the second audit reported 5 files previously marked as damaged reset to undamaged, so we have 12104710 + 5 = 12104715. Then 12104715/25143762 =~ 48%. Seems pretty close?

And since tape is sequential, both tapes are still physically full since those files are still occupying space on those tapes, but their metadata just isn't in the database anymore, at least not for those tapes since the metadata now reflects the good copy on the new offsite tape. And auditing those two tapes again is only comparing what's in the database against what's on the tapes not vice versa, so it won't complain about those dead files since it will never see them, it's only corroborating that the files reported in the database agree with those same ones on tape. So it would appear that the physical files on a tape can be a superset of what the database is tracking, but not the converse scenario. Whatever the database is tracking MUST be on the tape.

o That sound right?

o And in your first scenario, where a good copy exists in the copy pool, but the files are damaged in the primary, then if the files still exist on the client, will the next backup on the client resend the files?

o And in the second scenario, if the files are marked as damaged, will the next backup on the client resend the files? Or would you have to run fix=yes to force that?
 
PREDATAR Control23

Is this because when the damaged files are deleted, once replaced with a good copy, the PCT_UTILIZED is recalculated for the volume to factor in the deleted copies
Yes.

QUOTE="ldmwndletsm, post: 138073, member: 40003"]
And in your first scenario, where a good copy exists in the copy pool, but the files are damaged in the primary, then if the files still exist on the client, will the next backup on the client resend the files?
[/QUOTE]
No, you have to do a restore volume.

That sound right?
Yup.


And in the second scenario, if the files are marked as damaged, will the next backup on the client resend the files?
Yes if it that file was an active file and still exists, not if it was an inactive file.

Or would you have to run fix=yes to force that?
Irrelevant.
 
PREDATAR Control23

Appreciate that. That was very helpful.

So until you run an audit on a volume, would the database have any way of knowing about damaged files?

If you ran 'q content volume damaged=yes' or 'show damaged stgpool' then would there be any report if audit had never been run?

Assuming audit had been run on a bunch of tapes (and in the case of the copy pool, a backup stgpool command had not yet been run again to recopy the damaged files), then is the above show command reliable? Is that something you can run on a daily basis to check?
 
PREDATAR Control23

So until you run an audit on a volume, would the database have any way of knowing about damaged files?
If a client tries to restore/retrieve a damaged file. Or if a server process like migration, reclamation or backup stgpool try to read a damaged file.
Is that something you can run on a daily basis to check?
If it's beneficial info to you, there's certainly no harm in running it.
 
PREDATAR Control23

If a client tries to restore/retrieve a damaged file. Or if a server process like migration, reclamation or backup stgpool try to read a damaged file.

Got it. And would all of those culprits be identified in the activity log?

If it's beneficial info to you, there's certainly no harm in running it.

If there are damaged files that have not been addressed (in the case of a copy pool, possibly a backup stgpool command has not been run) then would there be any cases where this command would not report everything?
 
Top