Delete filespace - ANR7823S Internal error BUF011 detected.

daniel.lestar

ADSM.ORG Member
Joined
May 13, 2014
Messages
16
Reaction score
0
Points
0
Hello,

I have an issue with a TSM 5.5.5.2 server.
I have only one node on it, but with 1,4 billion files, all under one filespace (old archive).

I want to delete it, but every time i start a delete filespace, the server crashes with the following message:

ANR7824S Server operation terminated.
ANR7823S Internal error BUF011 detected.
ANR9999D Trace-back of called functions:
ANR9999D 00000001019d2e5c pkShowCallChain
ANR9999D 0000000101968c5c pkAbort
ANR9999D 00000001007b2498 bufLatch
ANR9999D 0000000100838e68 MergeNode
ANR9999D 0000000100837c6c LongMerge
ANR9999D 000000010083764c LongMerge
ANR9999D 000000010083764c LongMerge
ANR9999D 000000010083764c LongMerge
ANR9999D 000000010083764c LongMerge
ANR9999D 000000010083764c LongMerge
ANR9999D 0000000100837214 TbMergeTree
ANR9999D 000000010081a16c TbDelete
ANR9999D 0000000100813840 tbTableOp
ANR9999D 0000000100b23718 DeleteArchives
ANR9999D 0000000100b1bc2c imFSDeletionThread
ANR9999D 0000000101970ac8 StartThread
ANR9999D ffffffff7ddd8558 *UNKNOWN*
ANR7820S Server thread 2 (tid 2) terminated in response to program abort.
ANR7820S Server thread 3 (tid 3) terminated in response to program abort.
ANR7820S Server thread 4 (tid 4) terminated in response to program abort.
ANR7820S Server thread 5 (tid 5) terminated in response to program abort.
ANR7820S Server thread 6 (tid 6) terminated in response to program abort.

When i try to reduce the retention, and start an expire inventory, I get the same message, and the server immediately crashes.
Do you have any idea, what i could try?
I tried to delete some volumes with discarddata=yes, but the DB size was not changed, so I think the data is still there in the DB.
 
Hi Daniel,

The BUF011 message indicates that TSM was not able to read data from the DB/Logs. I would expect the problem is accessing the DB data as the logs are only holding current inflight data that has not been committed to the DB yet.

I have seen this error when the DB or logs were not damaged and the problem was due to the hardware.

To verify if this is DB corruption or just a hardware problem. Halt TSM and do the dumpdb command:

DSMSERV DUMPDB DEVclass=device_class_name

Track/save all reported damaged pages. Example:
ANR4012W DUMPDB: Database page 9122855 is damaged.
ANR4011W DUMPDB: Database page 9122855 is invalid - it will be skipped.

Once the first dump db has completed, do a second dumpdb and see if the same page is damaged.

If you do not see the exact same pages report as damaged (in this example 9122855), then this is likely just a hardware problem and the data is good on the disk. You would then do a restore DB to local disk.

If the exact same pages are reported during multiple dumpdb attempts, then this is a damaged DB. Your choices are to restore to a point in time before the DB was damaged, or you can do all the steps to salvage the DB by doing the dumpdb/loadformat/loaddb and auditdb.

After completing the dumpdb, you can restart the server and schedule a new outage to complete any recovery step. Once you do the loadformat command, you will need to complete the loaddb and auditedb before you will be able to restart the TSM server OR do a point in time restore db.

I hope this is helpful.
 
Back
Top