ANR9999D running RESTORE STORAGE POOL

Fruitysoup · Oct 24, 2012

Hi All

I've got an unsettling feeling about this, but when I ran a restore storage pool against our main backup pool, I got (full error in attached fileView attachment FullError.txt) :

10/24/2012 11:05:35 ANR0984I Process 297 for RESTORE STORAGE POOL (PREVIEW)
started in the FOREGROUND at 11:05:35 AM. (SESSION: 5835,
PROCESS: 297)
10/24/2012 11:05:35 ANR1231I Restore preview of primary storage pool FILE_POOL
started as process 297. (SESSION: 5835, PROCESS: 297)
10/24/2012 11:05:35 ANR9999D_3853744942 HandlePrimaryFile(afrestor.c:2045)
Thread<54656>: Error 145 getting volume name for volId

57681. (SESSION: 5835)

Some background : About a month ago I ran out of server space because of a Domino node that wasn't expiring transaction logs correctly. I deleted a couple of filespaces from clients that weren't being backed up any more in order to free some space in the FILE_POOL, that was fine, and the server started backing up again. I eventually resolved the transaction log expiry issue and gained around 700GB of storage space, so the system seemed quite happy.
As a matter of course, I regularly backup the primary pool to tape using BACKUP STGPOOL. I noticed after running out of storage space the backup was failing so ran an AUDIT VOL FIX=yes against the FILE_POOL and hoped to repair the erroneous volumes that were causing the backup to fail. However, trying to do a RESTORE STGPOOL which 'Succeeded' without listing off the tapes required.

tsm: SERVER1>restore stgpool file_pool prev=yes wait=yes
ANR0984I Process 297 for RESTORE STORAGE POOL (PREVIEW) started in the FOREGROUND at 11:05:35 AM.
ANR1231I Restore preview of primary storage pool FILE_POOL started as process 297.
ANR1234I Restore process 297 ended for storage pool FILE_POOL.
ANR0985I Process 297 for RESTORE STORAGE POOL (PREVIEW) running in the FOREGROUND completed with completion state SUCCESS at 11:05:49 AM.
ANR1239I Restore preview of primary storage pool FILE_POOL has ended. Files Restored: 0, Bytes Restored: 0.

We run dedup on this pool, and calling

db2 "select volname from tsmdb1.ss_volume_ids where volid in ( select volid from tsmdb1.af_segments where srvid=0 and poolid=6 and bfid in ( select superbfid from tsmdb1.bf_aggregated_bitfiles where srvid=0 and offset=0 and length=0 ) )" |sort |cub -b1-30 | uniq

gave a list of 166 volumes with damage.

Now, my question is, is the pool totally knackered ? Do I delete all volumes in FILE_POOL and start again, or is there something I can do to perhaps ? Any suggestions appreciated

FYI :
Session established with server SERVER1: Linux/x86_64
Server Version 6, Release 2, Level 2.0

View attachment file_pool.txt

chad_small · Oct 24, 2012

So what was the result of the AUDIT VOL FIX=YES? Did it say it fixed the volumes? Is the file pool a disk storage pool? If so clean them out, delete the volumes, and then recreate them and that should fix it. If they are a file devclass then use MOVE DATA then delete the FILE volumes. I've have some similar issues in days past and the deleting the disk pool volumes and recreating them fixed my problem.

Fruitysoup · Nov 15, 2012

Hi Chad

Unfortunately it didn't do a lot of good. I ran a series of 'move data' and 'del vol' to get the good files out of the file pool and deleted the resulting bad volumes. To get aroudn the "Error 145", I did some scary database modifications that allowed me to get at the remaining volumes and delete them. The process involved running :

restore stgpool file_pool prev=yes wait=yes

and getting the volid from the error, the in a db2 session :

insert into SS_VOLUME_IDS ( volid,volname, poolid, strategy, update_date ) values( XXXXX, '/tmp/voltest' , 6, 30, SYSDATE )

where XXXXX is the volume id it was complaining about, and '/tmp/voltest' didn't have to exist. I was then able to run

del vol /tmp/voltest discard=yes

which cleared out the database of all the stale records pertaining to the damaged vol, although it didn't remove the volume itself and so I had to do that manually in a db2 session with :

delete from SS_VOLUME_IDS where poolid=6 and volname= '/tmp/voltest' and volid=XXXXX

It was all a bit hairy, but managed to get to a position where the restore stgpool would not crash out. This morning I tried to do a backup stgpool after all the changes and i'm still getting volumes damaged dedup bitfiles and ANR4895E. I'll try a couple more things but if it's still not good I'll do like you said and delete all the data in the pool and start from scratch.

thanks for the help

ANR9999D running RESTORE STORAGE POOL

Fruitysoup

Active Newcomer

chad_small

Fruitysoup

Active Newcomer

Data Privacy Impact Assessment

Sponsor ADSM.ORG

Navigation Menu

NordVPN 3 Months FREE

Forum statistics