Strange TSM crash and issue

putukas

ADSM.ORG Member
Joined
Aug 10, 2006
Messages
41
Reaction score
1
Points
0
Location
Estonia, Tallinn
Website
Visit site
Hello,

I have tsm 6.1.2 server running on redhat linux. I have diskpools and tapepools and also have one active-data pool coimmon tsm environment.
I have maintanance script running every morning doing migrations, reclaim and db backup etc.

Few weeks ago i activated deduplication on the activepool. Everything worked fine until 2 days ago after tsm was accessing my active-data pool server crashed with odd internal error.

ANR9999D_1187679840 BfUpdateAggrAttributes(bfaggrut.c:2897) Thread<98>: Illegal update for logical
127640699.
ANR9999D Thread<98> issued message 9999 from:
ANR9999D Thread<98> 0x0000000000c0aa12 OutDiagToCons+0x0x142
ANR9999D Thread<98> 0x0000000000c0d594 outDiagfExt+0x0x194
ANR9999D Thread<98> 0x00000000005fc511 BfUpdateAggrAttributes+0x0xce1
ANR9999D Thread<98> 0x00000000005eb35a bfDestroy+0x0x4ba
ANR9999D Thread<98> 0x0000000000660a4b bfDerefDeleteThread+0x0x6fb
ANR9999D Thread<98> 0x0000000000c7bd2b StartThread+0x0xcb
ANR9999D Thread<98> 0x0000003e01806617 *UNKNOWN*
ANR9999D Thread<98> 0x0000003e00cd3c2d *UNKNOWN*

Bitfile Object: 127640699
**Super-bitfile 127640699 contains following aggregated bitfiles,
Bitfile Id, offset, length, active state or owner, link bfid
125181398 0 1302 Active
125181743 0 0 Inactive
125181744 0 0 Inactive
125181780 0 0 Inactive
125181783 0 0 Inactive
125181784 0 0 Inactive
125181785 0 0 Inactive
125181786 0 0 Inactive
125181787 0 0 Inactive
125181788 0 0 Inactive
125181789 0 0 Inactive
125181790 0 0 Inactive
125181798 1302 1895 Active
125181808 3197 504 Active
133346555 3701 4618 125181808

**Archival Bitfile Entry
Bitfile Type: ACTIVEDATA Storage Format: 22
Bitfile Size: 0.8383 Number of Segments: 1, flags: 2
Storage Pool ID: -1000003 Volume ID: 13872 Volume Name: /mnt/DS4500a/activedata/00003630.BFS
ANR9999D_1187679840 BfUpdateAggrAttributes(bfaggrut.c:2897) Thread<98>: Illegal update for logical
127640698.
ANR9999D Thread<98> issued message 9999 from:
ANR9999D Thread<98> 0x0000000000c0aa12 OutDiagToCons+0x0x142
ANR9999D Thread<98> 0x0000000000c0d594 outDiagfExt+0x0x194
ANR9999D Thread<98> 0x00000000005fc511 BfUpdateAggrAttributes+0x0xce1
ANR9999D Thread<98> 0x000000000060e999 bfPrepareTxn+0x0x1e9
ANR9999D Thread<98> 0x0000000000bd97c4 CollectVotes+0x0xc4
ANR9999D Thread<98> 0x0000000000bd9be3 tmEndX+0x0xd3
ANR9999D Thread<98> 0x0000000000660c9e bfDerefDeleteThread+0x0x94e
ANR9999D Thread<98> 0x0000000000c7bd2b StartThread+0x0xcb
ANR9999D Thread<98> 0x0000003e01806617 *UNKNOWN*
ANR9999D Thread<98> 0x0000003e00cd3c2d *UNKNOWN*

Bitfile Object: 127640698
**Super-bitfile 127640698 contains following aggregated bitfiles,
Bitfile Id, offset, length, active state or owner, link bfid
124951793 0 0 Segmentation fault (core dumped)

i had one big wtf moment. I havent seen something like this before and dont know what the hell should i do. First i tryed to get tsm up and working again. So i tryed to restore tsm db from morning backup firstly it failed. I had to stop db2 enginge with db2stop. OK now i was able to restore tsm db from backup. I got tsm running again.

Unfortunelly as soon as i started tsm again deduplication process started as process id 1 and instantly server crashed with same error. After doing some workarounds i was able to kill the process and start AUDIT VOLUME for active-data sotrage pool.
After waiting few hours as tsm trying to fix it i got error with certain volume. TSM was not able to access the volume.

OK now heres my problem. I have TSM running atm and i have active-data pool location renamed on OS level so tsm dont have access to the volumes atm and TSM is not crashing and i have still operational tsm running. But i cant delete broken volumes from TSM and TSM is not fixing storage pool itself.
If i try to do delete volume i got following error:
ANR2229W Discard data process terminated for volume /mnt/DS4500a/activedata/000036AA.BFS - internal server
error detected.
ANR9999D Thread<201> issued message 2229 from:
ANR9999D Thread<201> 0x0000000000c0594a outRptf+0x0xba
ANR9999D Thread<201> 0x00000000004a0c73 AdmVolDelThread+0x0x10a3
ANR9999D Thread<201> 0x0000000000c7bd2b StartThread+0x0xcb
ANR9999D Thread<201> 0x0000003e01806617 *UNKNOWN*
ANR9999D Thread<201> 0x0000003e00cd3c2d *UNKNOWN*

Does any TSM'r have any idea what would help me fixing this problem.

Please help!
 
Hi,

I encountered the same exact problem.

First, I wanted to delete ADPools volumes. It took lots of time to dispose of them, the process crashed constantly and had to be restarted repeatedly.

Then i upgraded TSM to the newest patch available (from 6.1.2.0 to 6.1.3.4) - but it did not help at all for that issue.

Lately, during the delete vol process TSM crashed (DB2 was still working) with the same error as You had:

05/08/10 08:23:41 ANR9999D_1187679840 BfUpdateAggrAttributes(bfaggrut.c:289-
7) Thread<45>: Illegal update for logical size (937566 >
0) of aggregated bitfile 382235463.
05/08/10 08:23:41 ANR9999D Thread<45> issued message 9999 from:
05/08/10 08:23:41 ANR9999D Thread<45> 0x0000000100012d50 StdPutText
05/08/10 08:23:41 ANR9999D Thread<45> 0x000000010001389c OutDiagToCons
05/08/10 08:23:41 ANR9999D Thread<45> 0x000000010000ebdc outDiagfExt
05/08/10 08:23:41 ANR9999D Thread<45> 0x000000010040e188
BfUpdateAggrAttributes
05/08/10 08:23:41 ANR9999D Thread<45> 0x000000010066a2a8 bfDestroy
05/08/10 08:23:41 ANR9999D Thread<45> 0x00000001004f4600
bfDerefDeleteThread
05/08/10 08:23:41 ANR9999D Thread<45> 0x0000000100009bb4 StartThread
05/08/10 08:23:41 Bitfile Object: 382235463
05/08/10 08:23:41 **Super-bitfile 382235463 contains following aggregated
bitfiles,
05/08/10 08:23:41 Bitfile Id, offset, length, active state or owner, link
bfid


I hope deleting the Active Date Pools will solve the problem.

What's the status of the issue at your TSM? Have you resolved it yet?

K.
 
i ended up reinstalling tsm from scratch. Even IBM helpdesk could'nt help me in time. TSM crashed every time if access to active pool was needed. Something messed up my tsm db for good. Im not sure if it was activedata pool or deduplication on activepool. As i read from manuals deduplication on activepool is not recommended by ibm.
 
Back
Top