ADSM-L

Re: I/O errors on copy pool volumes

1997-07-01 14:15:13
Subject: Re: I/O errors on copy pool volumes
From: "Paul L. Bradshaw" <paulb AT DATATOOLS DOT COM>
Date: Tue, 1 Jul 1997 11:15:13 -0700
Well, there is certainly a non-zero probability that a tape in the copy
storage pool will have a bad spot on it.  This probability goes up with
number of tapes and age.  If you want to minimize this probability (only
you can answer what risk level and cost level is appropriate) you could do
one or more of the following:

1.  Make another copy of the storage pool.  This can be quite costly if you
have a large system, and may blow any backup windows you have as well.

2.  Re-visit your mgmt. class assignments.  For those systems/files that
you deem to be vital records (hopefully a lot less than everything!) , back
those up to a separate storage pool.  You can then make multiple copies of
this storage pool (which should be much smaller!).  Making 2 or more copies
will make your risk close to zero (don't forget to make multiple db backup
copies as well!).

Paul L. Bradshaw                                phone: (408) 617-9125
BMC Software (previously DataTools)             fax: (408) 617-9101 or 9162
965 Stewart Dr.                                 mailto:paulb AT datatools DOT 
com
Sunnyvale, CA 94086                             http://www.datatools.com



----------
> From: Tom Denier <tom AT STAFF.UDC.UPENN DOT EDU>
> From: Tom Denier <tom AT STAFF.UDC.UPENN DOT EDU>
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: I/O errors on copy pool volumes
> Date: Tuesday, July 01, 1997 10:36 AM
>
> Prather, Wanda wrote:
> >
> > Hi Tom,
> >
> > I do not agree with your presumption that the copy storage pool volume
> > will suffer an I/O error if the primary volume does.
>
> That is not what I was claiming. The copy pool cartridges are purchased
> from the same suppliers as the primary pool cartridges, and written
> using the same drives. If the primary pool has, for example, five
> unreadable files, I would think the copy pool would have about the same
> number of unreadable files. I don't for a moment believe that the two
> sets of unreadable files would contain the same files.
>
> > First, I have worked with 3490E drives a long time, and they are are
> > ENORMOUSLY reliable.  We move gigabytes of data per day, and see less
> > than 1 hard I/O error per month (we have 10 mainframe-attached IBM 3490
> > E's and 6 RS6000-attached STK 3490E compatible drives).
>
> I have been administering an ADSM server for about a year. In that time
> I have seen one cartridge turn out to be completely unreadable when a
> MOVE DATA was done, three with some files found to be unreadable during
> volume reclamation, and at least a dozen with some files found to be
> unreadable during EXPORT operations. That is a very low percentage of
> the files stored, but even one unreadable file could be a major problem
> in a disaster recovery situation (a point discussed in more detail
below).
>
> > Read errors can occur due to a mechanical drive error.  But it is
> > usually because the tape has been damaged at some point (stretched or
> > crumpled) or the tape was VERY old or had a bad spot to start with.
> > This does happen, but it should happen VERY seldom, and there is no
> > particular liklihood that the copy pool tape would suffer the same
> > damage at the same time.
> >
> > Also, when ADSM creates a copy pool tape, it is a logical copy, not a
> > physical one.  ADSM copies each file, rather than making an exact
> > physical duplicate of the tape.  My primary pool is collocated, and the
> > copy pool is not, so there is not even a 1-to-1 correspondence between
> > the contents of the primary and copy pool tapes, so there is no reason
> > to think one will be damaged if the other is.
> >
> > If you are getting read errors on a tape, either:
> >         1) there is physical damage to the tape, in which case the copy
> > storage pool tape is a suitable recovery mechanism, or
> >         2) there is a logical error because ADSM put the data on the
> > tape incorrectly, in which case you have a reportable, fixable problem.
> >
> > I suggest that next time you have the problem, DO try to recover the
> > file from the copy pool tape.  If you can't, you should report the
> > problem to IBM ADSM support at once, because that should never happen.
>
> I have no doubt that the copy storage pool is a satisfactory mechanism
> for recovering from read errors on individual primary storage pool
> volumes. That is not the concern I was getting at. If I come to work
> one morning and find a smoldering pile of rubble where my workplace
> used to be, there will be only copy of each backup file available:
> the one stored offsite on one or another of the copy pool volumes. If
> that one copy of a critical file proves to be unreadable, I will have
> no way to obtain a copy from a different volume. The incidence of
> unreadable files on our primary storage pool tapes is low but definitely
> non-zero. Given this, I have to assume that there would be a significant
> chance of running into some unreadable files when trying to recreate
> our systems from offsite tapes after a disaster.
>
> > If you can recover from the copy pool tape, only the primary tape is
> > bad.  If it happens often, you should talk to your Customer Engineer.
> > He/she should be able to determine exactly what caused the I/O error;
> > whether you have a batch of bad tapes, or a drive maintenance problem.
> > You should not be having consistent problems with a 3490E.
>
> We have an IBM maintenance contract on our 3490E drives, so IBM is aware
> of the frequency of read errors we see. They have never indicated that
> they consider it abnormal.
>
> > Good luck!
> >
> >
> >
 =======================================================================
> > Wanda Prather
> > Johns Hopkins Applied Physics Lab
> > 301-953-6000 X8769
> > wanda_prather AT jhuapl DOT edu
> >
> > "Intelligence has much less practical application than you'd think."
> >               - Scott Adams/Dilbert
> >
 =======================================================================
> >
> > > ----------
> > > From:         Tom Denier[SMTP:tom AT WAL6000B.UDC.UPENN DOT EDU]
> > > Sent:         Monday, June 30, 1997 4:40 PM
> > > To:   ADSM-L AT VM.MARIST DOT EDU
> > > Subject:      I/O errors on copy pool volumes
> > >
> > > Our site uses a copy storage pool which is stored offsite for
disaster
> > > recovery purposes. We have three 3490E drives, which are used to read
> > > and write volumes for both the copy storage pool and a primary
storage
> > > pool kept onsite. We have occasionally attempted to read files from a
> > > primary storage poor cartridge and had the attempt fail with I/O
> > > errors.
> > > When this happens to a primary storage pool volume we have the option
> > > of recovering files from the copy storage pool. However, a copy
> > > storage
> > > pool volume is presumably just as likely as a primary storage pool
> > > volume to suffer this kind of problem. If we ran into such a problem,
> > > it would probably be after a disaster at a regular site, so that we
> > > would not have the option of retrieving copies of the unreadable
files
> > > from somewhere else. How are other sites dealing with this concern?
> > >
> >
<Prev in Thread] Current Thread [Next in Thread>