ADSM-L

Re: I/O errors on copy pool volumes

1997-07-01 13:36:53
Subject: Re: I/O errors on copy pool volumes
From: Tom Denier <tom AT STAFF.UDC.UPENN DOT EDU>
Date: Tue, 1 Jul 1997 13:36:53 -0400
Prather, Wanda wrote:
>
> Hi Tom,
>
> I do not agree with your presumption that the copy storage pool volume
> will suffer an I/O error if the primary volume does.

That is not what I was claiming. The copy pool cartridges are purchased
from the same suppliers as the primary pool cartridges, and written
using the same drives. If the primary pool has, for example, five
unreadable files, I would think the copy pool would have about the same
number of unreadable files. I don't for a moment believe that the two
sets of unreadable files would contain the same files.

> First, I have worked with 3490E drives a long time, and they are are
> ENORMOUSLY reliable.  We move gigabytes of data per day, and see less
> than 1 hard I/O error per month (we have 10 mainframe-attached IBM 3490
> E's and 6 RS6000-attached STK 3490E compatible drives).

I have been administering an ADSM server for about a year. In that time
I have seen one cartridge turn out to be completely unreadable when a
MOVE DATA was done, three with some files found to be unreadable during
volume reclamation, and at least a dozen with some files found to be
unreadable during EXPORT operations. That is a very low percentage of
the files stored, but even one unreadable file could be a major problem
in a disaster recovery situation (a point discussed in more detail below).

> Read errors can occur due to a mechanical drive error.  But it is
> usually because the tape has been damaged at some point (stretched or
> crumpled) or the tape was VERY old or had a bad spot to start with.
> This does happen, but it should happen VERY seldom, and there is no
> particular liklihood that the copy pool tape would suffer the same
> damage at the same time.
>
> Also, when ADSM creates a copy pool tape, it is a logical copy, not a
> physical one.  ADSM copies each file, rather than making an exact
> physical duplicate of the tape.  My primary pool is collocated, and the
> copy pool is not, so there is not even a 1-to-1 correspondence between
> the contents of the primary and copy pool tapes, so there is no reason
> to think one will be damaged if the other is.
>
> If you are getting read errors on a tape, either:
>         1) there is physical damage to the tape, in which case the copy
> storage pool tape is a suitable recovery mechanism, or
>         2) there is a logical error because ADSM put the data on the
> tape incorrectly, in which case you have a reportable, fixable problem.
>
> I suggest that next time you have the problem, DO try to recover the
> file from the copy pool tape.  If you can't, you should report the
> problem to IBM ADSM support at once, because that should never happen.

I have no doubt that the copy storage pool is a satisfactory mechanism
for recovering from read errors on individual primary storage pool
volumes. That is not the concern I was getting at. If I come to work
one morning and find a smoldering pile of rubble where my workplace
used to be, there will be only copy of each backup file available:
the one stored offsite on one or another of the copy pool volumes. If
that one copy of a critical file proves to be unreadable, I will have
no way to obtain a copy from a different volume. The incidence of
unreadable files on our primary storage pool tapes is low but definitely
non-zero. Given this, I have to assume that there would be a significant
chance of running into some unreadable files when trying to recreate
our systems from offsite tapes after a disaster.

> If you can recover from the copy pool tape, only the primary tape is
> bad.  If it happens often, you should talk to your Customer Engineer.
> He/she should be able to determine exactly what caused the I/O error;
> whether you have a batch of bad tapes, or a drive maintenance problem.
> You should not be having consistent problems with a 3490E.

We have an IBM maintenance contract on our 3490E drives, so IBM is aware
of the frequency of read errors we see. They have never indicated that
they consider it abnormal.

> Good luck!
>
>
> ========================================================================
> Wanda Prather
> Johns Hopkins Applied Physics Lab
> 301-953-6000 X8769
> wanda_prather AT jhuapl DOT edu
>
> "Intelligence has much less practical application than you'd think."
>               - Scott Adams/Dilbert
> ========================================================================
>
> > ----------
> > From:         Tom Denier[SMTP:tom AT WAL6000B.UDC.UPENN DOT EDU]
> > Sent:         Monday, June 30, 1997 4:40 PM
> > To:   ADSM-L AT VM.MARIST DOT EDU
> > Subject:      I/O errors on copy pool volumes
> >
> > Our site uses a copy storage pool which is stored offsite for disaster
> > recovery purposes. We have three 3490E drives, which are used to read
> > and write volumes for both the copy storage pool and a primary storage
> > pool kept onsite. We have occasionally attempted to read files from a
> > primary storage poor cartridge and had the attempt fail with I/O
> > errors.
> > When this happens to a primary storage pool volume we have the option
> > of recovering files from the copy storage pool. However, a copy
> > storage
> > pool volume is presumably just as likely as a primary storage pool
> > volume to suffer this kind of problem. If we ran into such a problem,
> > it would probably be after a disaster at a regular site, so that we
> > would not have the option of retrieving copies of the unreadable files
> > from somewhere else. How are other sites dealing with this concern?
> >
>
<Prev in Thread] Current Thread [Next in Thread>