Solaris 2.6 / TSM Server 4.1.2.12. A few days ago we experienced a
spontaneous crash/reboot during a TSM BACKUP DB operation. Following the
reboot, our TSM Server software came up normally and we repeated the
BACKUP DB operation, which reported successful completion. Shortly
afterward, we did a DRM Prepare, MOVE MEDIA and MOVE DRMEDIA, and then
initiated our 2nd daily BACKUP DB. The 2nd BACKUP DB failed immediately with
error: "ANR9999D ic.c(329): Zero bit count mismatch for SMP page
addr 738304; Zero Bits =105, HeaderZeroBits = 0."
Aside from the BACKUP DB failure, the TSM Server is fully operational,
performing scheduled tasks, client backups, migrations, etc.
At this point we're in a Catch-22. Several days have passed. We can't
get the DB to back itself up (full or incremental), and fear that once we
halt the TSM Server software, it won't restart. We would be forced to
restore from the last good DB backup which would now cost us several
nights' "successful" backup cycles. Is there a way to fix or recover from
this without losing all those client backups ?
This is critical & getting more so daily - I need some fast answers & a
plan of attack from TSM'ers who have been through this:
- has anyone successfully recovered from an SMP Page mismatch error
without a DB Restore?
- given that the DB is up/functional now, and has performed several
night's backups, is there a way to export/preserve client
backup activity since the incident occurred ?
- if we bring down TSM, should we disable all or specific DB and/or Log
mirrors first?
- how can I determine which DB volume contains the problem SMP page
number?
- has anyone (incl Tivoli support/consulting) ever successfully repaired
an SMP page mismatch error, & if so how, using what tools/utilities?
- will AUDIT DB detect/fix an SMP page header mismatch error?
- would UNLOAD DB / LOAD DB be better than AUDIT DB, or do we need to run
both, and if so, in what order?
-rsvp, thanks
Kent Monthei
GlaxoSmithKline
|