audit volume reports more files inspected than query content

ldmwndletsm · Feb 14, 2022

Why would an audit volume report more files inspected than a "query content"?

Running 8.1.11 on Linux.

The volume in question (A00001.out) has an access of readonly.

o q content A00001L6 f=d > A00001.out
o grep "Node Name" A00001.out | wc -l (this reports 947)
o audit volume A00001L6 fix=no (actlog reports two volumes required)

This reads for a while and then loads another tape (one or more files spanned from A00001L6 to this tape), reads that one for a few minutes and then reports thus:

02/10/2022 20:25:06 ANR4133I Audit volume process ended for volume A00001L6
1388 files inspected, 0 damaged files found and marked
as damaged, 0 files previously marked as damaged reset
to undamaged, 0 objects need updating. (SESSION:\n716417, PROCESS: 881)

o q content A00001L6 f=d > A00001.out2
o grep "Node Name" A00001.out | wc -l (still reports 947)
o audit volume A00001L6 fix=no skippartial=yes (reports 1388 files inspected, no differences from previous audit).

As near as I can tell, "q content", or even "q content f=d", both report 947 files. "q content damaged=yes" reports nothing. The only thing that occurs to me is that possibly

I also saw this once in the past wherein audit reported 2 more files inspected than query content. Any theories?

ldmwndletsm · Feb 14, 2022

Where I started to say: "the only thing that occurs to me is that possibly", I simply have no idea other than perhaps the audit's idea of a file is somehow different? Even if it was somehow only looking at aggregates then I'd think it would derive a lower number. When I'm counting the entries in "q content', I'm careful to restrict it to "Node Name: ", so I don't pick up files with that string in their name. I even piped that to a 'sort -u', and I can see that it's only returning the node name lines, so that should be the number of files, right? Even if I look for "Client's Name for File:", and I exclude "Hexadecimal Client's Name for File:", or I search for "Filespace Name:", I still get 947. I am not paying attention to aggregates ("Aggregated?").

marclant · Feb 15, 2022

There could be large files that are split or fragmented. Can you do additional checks with these commands?

This one will give you how many objects are on this volume:

SQL:

select count(OBJECT_ID) as OBJECT_COUNT from contents where volume_name='A00001L6'

This one will give you how many unique objects on the volume, so if a large file is split, it would only count it once.

SQL:

select count(distinct(OBJECT_ID)) as FILE_COUNT from contents where volume_name='A00001L6'

Maybe there are linked files on the tape too. You could try using:

Code:

query content A00001L6 followlinks=yes

ldmwndletsm · Feb 15, 2022

Thank you, marclant, here's what it reports (I've also included the output from some other commands, too):

1. select count(OBJECT_ID) as OBJECT_COUNT from contents where volume_name='A00001L6'
OBJECT_COUNT: 1388

2. select count(distinct(OBJECT_ID)) as FILE_COUNT from contents where volume_name='A00001L6'
FILE_COUNT: 1388

3. query content A00001L6 followlinks=yes

This produces identical output to 'followlinks=no' (default). I ran it with f=d and compared to my original output (query content volume A00001L6 f=d). There are no differences. Also, I ran it with 'followlinks=justlinks' and no output is returned:
ANR2034E QUERY CONTENT: No match found using this criteria.
ANS8001I Return code 11.

4. query content A00001L6L6 copied=no
ANR2034E QUERY CONTENT: No match found using this criteria.
ANS8001I Return code 11.

5. I double checked, and this volume is readonly and was last written to on 01/28/2022

6. I did perform a unique sort on all the file names in the "query content A00001L6 f=d" output, and there is no difference versus a non-unique sort:

grep "^ *Client's Name for File:" ../query_content_A00001L6.out | sort | md5sum
524e50b2c280b2e78b6ddbed3789660a -

grep "^ *Client's Name for File:" ../query_content_A00001L6.out | sort -u | md5sum
524e50b2c280b2e78b6ddbed3789660a -

marclant · Feb 15, 2022

I don't have an explanation. The count from the audit is correct, there's really 1388 objects on that volume in the contents table. But I don't know why QUERY CONTENT would only display 947.

It would be a lot of work, but you could try to compare the outputs of these 2 queries:

SQL:

query content A00001L6

SQL:

select * from contents where volume_name='A00001L6'

ldmwndletsm · Feb 15, 2022

Thanks, marclant!

I think I have an answer, but first, I failed to note that this tape's ACCESS had previously been changed to "UNAVAILABLE" due to a large number of errors that we were seeing on some tape drives. It was happening on a number of tapes on those drives. For this tape, error_state=NO, write_errors=0, read_errors=8.

Anyway, I ran the following command as you suggest that I could try:

select * from contents where volume_name='A00001L6' > select_all_contents_A00001L6

It was obvious, however, in looking at the syntax of the output that any comparison between this and a "query content volume" is obviously going to require some coding as the SQL select statement is using the "actual" column names from the table versus the style that "q content" uses (duh!), and it includes some additional columns (e.g. BITFILE_ID, OBJECT_ID).

HOWEVER, we don't even get that far because the first thing I noticed is that the last umpteen entries from the select statement have an UNKNOWN backup type. I do not see that string in the "query content" output. So I see entries like this:

TYPE: UNKNOWN

I then counted the number of occurrences of that string, and I get:

grep "TYPE: UNKNOWN" select_all_contents_A00001L6 | wc -l
441

And that's exactly the difference between 1388 (reported from audit volume) and 947 (reported from "query content").

[ Here's what we see for backup type in the output file from the "select * from contents" ]
grep "TYPE:" select_all_contents_A00001L6 | sort -u
TYPE: Bkup
TYPE: UNKNOWN

[ Here's what see for backup type in the output file from the "query content" ]
grep "Type:" query_content_A00001L6.out | sort -u
Type: Bkup

[ Here's the last two file entries in the "select * from contents" output ]
VOLUME_NAME: A00001L6
NODE_NAME:
TYPE: UNKNOWN
FILESPACE_NAME:
AGGREGATED: No
FILE_SIZE:
SEGMENT: 1/1
FRAGMENT: 18
CACHED: NO
FILESPACE_ID: 1
FILESPACE_HEXNAME:
BITFILE_ID: 9273343027
OBJECT_ID: 9273343027
COPIED: Yes
DAMAGED: No
FILE_NAME:
FILE_HEXNAME:

VOLUME_NAME: A00001L6
NODE_NAME:
TYPE: UNKNOWN
FILESPACE_NAME:
AGGREGATED: No
FILE_SIZE:
SEGMENT: 1/2
FRAGMENT: 19
CACHED: NO
FILESPACE_ID: 1
FILESPACE_HEXNAME:
BITFILE_ID: 9273365734
OBJECT_ID: 9273365734
COPIED: Yes
DAMAGED: No
FILE_NAME:
FILE_HEXNAME:

So it appears that these last 441 file entries not only report an UNKNOWN backup type, but there is no node name, file space name or even a file name. Perhaps this is why the "query content" does not report these, but the audit does?

The other interesting thing here is that that last file entry reports "SEGMENT: 1/2". That makes sense since we'd expect that minimally at least one file would span onto the second volume that it loaded after it finished reading from A00001L6 during the audit. So that same file should have a "SEGMENT 2/2" on that other volume. HOWEVER, the "query content A00001L6 f=d" reports only "1/1" segment numbers, but I guess that jibes given that it can't see any of those file entries with a backup type of "UNKNOWN".

If I run: "show bfo 9273365734", it reports output, including two primary pool volumes (A00001L6 and the second volume that the audit loaded) and one copy pool volume.

[ There do not appear to be any aggregates on the tape in question ]
grep AGGREGATED select_all_contents_A00001L6 | sort -u
AGGREGATED: No

QUESTIONS
1. Have you seen this behavior before with type: UNKNOWN?

2. Is this anything to worry about?

3. What might cause this type of phenomenon?

I would think that these sundry "query" commands may simply be SQL commands on the backend (or some combination of such commands or views), but given that the IBM documentation for "query content" reports only the following for type: ANY, Backup, Archive and Spacemanaged then maybe its purview is limited only to those last three so it will never report one of type: UNKNOWN. So "query content" simply cannot peer into the contents table beyond the three backup types that it supports?

4. Would we expect that the converse scenario could likewise occur wherein the number of file inspected by audit volume could somehow be less than what "query content volume" reports?

I think I might have seen this before also.

marclant · Feb 16, 2022

Looks like there's an APAR for that: https://www.ibm.com/support/pages/apar/IT17285

Make sure you are higher than version 8.1.1.100 on the server.

ldmwndletsm · Feb 16, 2022

We're at 8.1.11. We're not using any replication. The volume was created since we've been on the current release.

When I said the last 441 files, yes, there are 441 of these unknown objects, but they're actually interspersed in the "select" output and not contiguous, so they occur in groupings with non-unknown objects in between.

It looks like "query content" is not going to reference anything in the database that points to an empty file (not zero length but empty as in there's really nothing there or however it would be best expressed), but "audit volume" and "select * from contents" are perfectly happy to do so. I doubt that a restore would be able to see those references and would most likely limit itself to whatever "query content" can see. No proof, just a guess.

Perhaps, when this occurred, TSM failed a rollback or some such thing, and whatever data would have otherwise actually been sent was resent later?

I'm not concerned about any additional space these may be taking up on the media.

audit volume reports more files inspected than query content

ldmwndletsm

ldmwndletsm

marclant

ldmwndletsm

marclant

ldmwndletsm

marclant

ldmwndletsm

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics