Bottomless pit

zforray

ADSM.ORG Member
Joined
Aug 16, 2013
Messages
11
Reaction score
3
Points
0
PREDATAR Control23

Here is a new one.......

We turned off backing up SystemState last week. Now I am going through and deleted the Systemstate filesystems.

Since I wanted to see how many objects would be deleted, I did a "Q OCCUPANCY" and preserved the file count numbers for all Windows nodes on this server.

For 4-nodes, the delete of their systemstate filespaces has been running for 5-hours. A "Q PROC" shows:

2019-02-25 08:52:05 Deleting file space ORION-POLL-WEST\SystemState\NULL\System State\SystemState (fsId=1) (backup data) for node ORION-POLL-WEST: 105,511,859 objects deleted.

Considering the occupancy for this node was ~5-Million objects, how has it deleted 105-Million objects (and counting). The other 3-nodes in question are also up to >100-Million objects deleted and none of them had more than 6M objects in occupancy?

At this rate, the deleting objects count for 4-nodes systemstate will exceed 50% of the total occupancy objects on this server that houses the backups for 263-nodes?

I vaguely remember some bug/APAR about systemstate backups being large/slow/causing performance problems with expiration but these nodes client levels are fairly current (8.1.0.2 - staying below the 8.1.2/SSL/TLS enforcement levels) and the ISP server is 7.1.7.400. All of these are Windows 2016, if that matters.
 
PREDATAR Control23

Whoa! That's a new one by me! In the past, TSM would famously re-introduce old bugs in new versions of the software! I think those days are gone, though.

That's a ludicrous number of objects for SystemState. I'd get a PMR with IBM going.

Good luck, and let us know what they find!

Thanks for the confirmation that I am not the only one seeing it and wondering what is going on. FWIW, the deletes all failed/crashed with strange "unexpected error 4522 fetching row in table "Backup.Objects" (or Filespaces). The last "q proc" I recorded:

2,325 DELETE FILESPACE Deleting file space ORION-POLL-W2\SystemState\NULL\System State\SystemState (fsId=1) (backup data) for node ORION-POLL-W2: 119,442,593 objects deleted.
2,326 DELETE FILESPACE Deleting file space ORION-POLL-E2\SystemState\NULL\System State\SystemState (fsId=1) (backup data) for node ORION-POLL-E2: 116,621,727 objects deleted.

Then I see this in the logs:

2/25/2019 3:07:29 PM ANR1893E Process 2324 for DELETE FILESPACE completed with a completion state of FAILURE.
2/25/2019 3:32:53 PM ANR0106E imfs.c(8340): Unexpected error 4522 fetching row in table "Filespaces".
2/25/2019 3:32:53 PM ANR0106E imfsdel.c(2723): Unexpected error 4522 fetching row in table "Backup.Objects".
2/25/2019 3:32:53 PM ANR1893E Process 2325 for DELETE FILESPACE completed with a completion state of FAILURE.
2/25/2019 4:29:26 PM ANR0106E imfsdel.c(2723): Unexpected error 4522 fetching row in table "Backup.Objects".
2/25/2019 4:29:26 PM ANR1893E Process 2326 for DELETE FILESPACE completed with a completion state of FAILURE.
 
PREDATAR Control23

Whoa! That's a new one by me! In the past, TSM would famously re-introduce old bugs in new versions of the software! I think those days are gone, though.

That's a ludicrous number of objects for SystemState. I'd get a PMR with IBM going.

Good luck, and let us know what they find!

This is getting interesting. On another ISP server, a similarly named node (I think it is the same application), a similar problem. This is breaking all kinds of records - might hit 200M objects - which is interesting since the total objects on this server is 493M


2019-02-26 08:57:56 Deleting file space ORIONADDWEB\SystemState\NULL\System State\SystemState (fsId=1) (backup data) for node ORIONADDWEB: 189,078,701 objects deleted.
 
PREDATAR Control23

zforray,

You are not the only one that has seen this. I am running TSM Server 7.1.7.1 and I have one node that has 5 meg of data left. I had started deleting filespace and ran for days; the number of objects went pass 200 million and all was pointed to system state.
 
PREDATAR Control23

I had to take multiple whacks at deleting systemstate filespaces across 3-days but it was finally deleted (kept failing with errors like:
2/26/2019 2:24:27 PM ANR0106E imfsdel.c(2723): Unexpected error 4522 fetching row in table "Backup.Objects".
2/26/2019 2:24:27 PM ANR1880W Server transaction was canceled because of a conflicting lock on table BACKUP_OBJECTS.
2/26/2019 2:24:27 PM ANR1893E Process 2399 for DELETE FILESPACE completed with a completion state of FAILURE.)

In the end, finally deleted 1.2B (that is Billion) systemstate objects from 4-nodes.

IBM did refer to an this article/webpage: https://social.technet.microsoft.co...s-is-filling-my-disk-space?forum=winservergen

and my OS engineer for these servers did say the \Crypto\RSA keys folder does exist on at least one of these machines.

The ISP server that ended up deleting 300M+ systemstate objects for 1-node dropped 2M total occupancy objects.

I am glad to have finally purged almost all systemstate backups (folks who manage AD servers requested keeping it for them since they have used TSM to restore AD objects in the past).
 
PREDATAR Control23

This happened again; filespace deletion started three days ago.

How can a server, with only 2.5M objects as reported by the select command be deleting 1.8+ billion obejcts and still going?

While the delete filespace was running, we monitored the file objects. This time the objects were reducing in count but at a rate that is 10 to 20 times slower than the filespace delete.

This time, the TSM server was at 7.1.7.1; last time it was at 7.1.5.2. My conclusion is that the issue is not server bound per se but may be a Node + Server combination, and/or Windows version. The node is a Windows 2008.

Bottom line, the filespace finally deleted after three days.

We still have to find the root cause so a PMR with IBM is in order.
 
Last edited:
Top