ADSM-L

AW: [ADSM-L] DB2 Redirect Restore

2005-09-23 03:33:03
Subject: AW: [ADSM-L] DB2 Redirect Restore
From: "Herrmann, Boris" <Boris.Herrmann AT ARAG DOT DE>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 23 Sep 2005 09:32:39 +0200
Hi..

the problem is now resolved. Audit the tape shows no error. IBM take 3
traces from TSM-Server, StorageAgent and API-Client. They found that the
default value for the option "resource timeout" was to small (60 minutes).
Commtimeout and Idletimeout are ok. The restore works fine after changing
the "resoure timeout" option on the server and storageagent to 120 minutes.

Kind regards,
Boris

-----Ursprungliche Nachricht-----
Von: Leigh Reed [mailto:L.Reed AT MDX.AC DOT UK]
Gesendet: Donnerstag, 22. September 2005 11:01
An: ADSM-L AT VM.MARIST DOT EDU
Betreff: Re: [ADSM-L] DB2 Redirect Restore


Can you audit the primary volume that is erroring ?

If you manually offsite your copy pool tapes, can you recall the copy
pool tapes and try a restore with them.



If you know the primary volume name that it is failing on, then enter
the following



   restore volume xxxxxx preview=yes     (where xxxxxx is the volume
name of the primary volume that is giving the error)



The activity log will show you the list of copy pool volumes that the
primary tape is over. If your DB2 db is fairly large then this shouldn't
be too many tapes.

Checkin the copy pool tapes, mark the primary tape in error unavailable
and submit the restore again.



Do you have a fabric for your tape SAN ?

If so, are you aware that when making changes to your fabric (Zoning), a
message called an RSCN (registered state change notification) is
broadcasted throughout the SAN. This causes all traffic to be
interrupted for a few milliseconds. SAN attached disk can recover easily
from this, however tape does not and any data that is being written at
the time is corrupted and it is a 'silent' corruption. Therefore, you do
not see it until you come to do the restore. Single large files like
db's are particularly prone to this as even a millisecond
outage/corruption and the whole file is useless.



One point, if your initial db backup was going straight to tape when the
corruption occurred, then the 'backup stg  tapepool copypool' will have
replicated the error.

If you have a fabric tape SAN, ensure all tape operation is quiesced
when making any fabric changes.



Before anybody shoots me down, there are now FC HBA's available that
offer the ADISC/PDISC function, instead of PLOGI/FLOGI upon receiving an
RSCN and if your tape devices support FCP Tape, data transfer can
recover from an RSCN. Personally, I still stop tape operation before
making changes to the fabric.





Leigh





-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Herrmann, Boris
Sent: 22 September 2005 08:59
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] DB2 Redirect Restore



Hello there!



We are trying to restore a DB2 Database over SAN. The StorageAgent mount
a

tape and the restore begins (creating the containers). After about 70

minutes we got the following error : An I/O error -72 occured on media
TSM.





We've tried this a lot of times with the same result. Does anyone know
why

this happens?



Kind regards,

Boris Herrmann

<Prev in Thread] Current Thread [Next in Thread>
  • AW: [ADSM-L] DB2 Redirect Restore, Herrmann, Boris <=