putukas
ADSM.ORG Member
- Joined
- Aug 10, 2006
- Messages
- 41
- Reaction score
- 1
- Points
- 0
- Location
- Estonia, Tallinn
- Website
- Visit site
Hello,
I have tsm 6.1.2 server running on redhat linux. I have diskpools and tapepools and also have one active-data pool coimmon tsm environment.
I have maintanance script running every morning doing migrations, reclaim and db backup etc.
Few weeks ago i activated deduplication on the activepool. Everything worked fine until 2 days ago after tsm was accessing my active-data pool server crashed with odd internal error.
i had one big wtf moment. I havent seen something like this before and dont know what the hell should i do. First i tryed to get tsm up and working again. So i tryed to restore tsm db from morning backup firstly it failed. I had to stop db2 enginge with db2stop. OK now i was able to restore tsm db from backup. I got tsm running again.
Unfortunelly as soon as i started tsm again deduplication process started as process id 1 and instantly server crashed with same error. After doing some workarounds i was able to kill the process and start AUDIT VOLUME for active-data sotrage pool.
After waiting few hours as tsm trying to fix it i got error with certain volume. TSM was not able to access the volume.
OK now heres my problem. I have TSM running atm and i have active-data pool location renamed on OS level so tsm dont have access to the volumes atm and TSM is not crashing and i have still operational tsm running. But i cant delete broken volumes from TSM and TSM is not fixing storage pool itself.
If i try to do delete volume i got following error:
Does any TSM'r have any idea what would help me fixing this problem.
Please help!
I have tsm 6.1.2 server running on redhat linux. I have diskpools and tapepools and also have one active-data pool coimmon tsm environment.
I have maintanance script running every morning doing migrations, reclaim and db backup etc.
Few weeks ago i activated deduplication on the activepool. Everything worked fine until 2 days ago after tsm was accessing my active-data pool server crashed with odd internal error.
ANR9999D_1187679840 BfUpdateAggrAttributes(bfaggrut.c:2897) Thread<98>: Illegal update for logical
127640699.
ANR9999D Thread<98> issued message 9999 from:
ANR9999D Thread<98> 0x0000000000c0aa12 OutDiagToCons+0x0x142
ANR9999D Thread<98> 0x0000000000c0d594 outDiagfExt+0x0x194
ANR9999D Thread<98> 0x00000000005fc511 BfUpdateAggrAttributes+0x0xce1
ANR9999D Thread<98> 0x00000000005eb35a bfDestroy+0x0x4ba
ANR9999D Thread<98> 0x0000000000660a4b bfDerefDeleteThread+0x0x6fb
ANR9999D Thread<98> 0x0000000000c7bd2b StartThread+0x0xcb
ANR9999D Thread<98> 0x0000003e01806617 *UNKNOWN*
ANR9999D Thread<98> 0x0000003e00cd3c2d *UNKNOWN*
Bitfile Object: 127640699
**Super-bitfile 127640699 contains following aggregated bitfiles,
Bitfile Id, offset, length, active state or owner, link bfid
125181398 0 1302 Active
125181743 0 0 Inactive
125181744 0 0 Inactive
125181780 0 0 Inactive
125181783 0 0 Inactive
125181784 0 0 Inactive
125181785 0 0 Inactive
125181786 0 0 Inactive
125181787 0 0 Inactive
125181788 0 0 Inactive
125181789 0 0 Inactive
125181790 0 0 Inactive
125181798 1302 1895 Active
125181808 3197 504 Active
133346555 3701 4618 125181808
**Archival Bitfile Entry
Bitfile Type: ACTIVEDATA Storage Format: 22
Bitfile Size: 0.8383 Number of Segments: 1, flags: 2
Storage Pool ID: -1000003 Volume ID: 13872 Volume Name: /mnt/DS4500a/activedata/00003630.BFS
ANR9999D_1187679840 BfUpdateAggrAttributes(bfaggrut.c:2897) Thread<98>: Illegal update for logical
127640698.
ANR9999D Thread<98> issued message 9999 from:
ANR9999D Thread<98> 0x0000000000c0aa12 OutDiagToCons+0x0x142
ANR9999D Thread<98> 0x0000000000c0d594 outDiagfExt+0x0x194
ANR9999D Thread<98> 0x00000000005fc511 BfUpdateAggrAttributes+0x0xce1
ANR9999D Thread<98> 0x000000000060e999 bfPrepareTxn+0x0x1e9
ANR9999D Thread<98> 0x0000000000bd97c4 CollectVotes+0x0xc4
ANR9999D Thread<98> 0x0000000000bd9be3 tmEndX+0x0xd3
ANR9999D Thread<98> 0x0000000000660c9e bfDerefDeleteThread+0x0x94e
ANR9999D Thread<98> 0x0000000000c7bd2b StartThread+0x0xcb
ANR9999D Thread<98> 0x0000003e01806617 *UNKNOWN*
ANR9999D Thread<98> 0x0000003e00cd3c2d *UNKNOWN*
Bitfile Object: 127640698
**Super-bitfile 127640698 contains following aggregated bitfiles,
Bitfile Id, offset, length, active state or owner, link bfid
124951793 0 0 Segmentation fault (core dumped)
i had one big wtf moment. I havent seen something like this before and dont know what the hell should i do. First i tryed to get tsm up and working again. So i tryed to restore tsm db from morning backup firstly it failed. I had to stop db2 enginge with db2stop. OK now i was able to restore tsm db from backup. I got tsm running again.
Unfortunelly as soon as i started tsm again deduplication process started as process id 1 and instantly server crashed with same error. After doing some workarounds i was able to kill the process and start AUDIT VOLUME for active-data sotrage pool.
After waiting few hours as tsm trying to fix it i got error with certain volume. TSM was not able to access the volume.
OK now heres my problem. I have TSM running atm and i have active-data pool location renamed on OS level so tsm dont have access to the volumes atm and TSM is not crashing and i have still operational tsm running. But i cant delete broken volumes from TSM and TSM is not fixing storage pool itself.
If i try to do delete volume i got following error:
ANR2229W Discard data process terminated for volume /mnt/DS4500a/activedata/000036AA.BFS - internal server
error detected.
ANR9999D Thread<201> issued message 2229 from:
ANR9999D Thread<201> 0x0000000000c0594a outRptf+0x0xba
ANR9999D Thread<201> 0x00000000004a0c73 AdmVolDelThread+0x0x10a3
ANR9999D Thread<201> 0x0000000000c7bd2b StartThread+0x0xcb
ANR9999D Thread<201> 0x0000003e01806617 *UNKNOWN*
ANR9999D Thread<201> 0x0000003e00cd3c2d *UNKNOWN*
Does any TSM'r have any idea what would help me fixing this problem.
Please help!