ADSM-L

Your thoughts, please.

1997-05-13 16:32:00
Subject: Your thoughts, please.
From: "Mark W. Mapes" <MWM4%CTS%DCPP AT GO50.COMP.PGE DOT COM>
Date: Tue, 13 May 1997 13:32:00 PDT
We had a problem this weekend.

We lost a disk for one of our HP ADSM clients.  The way this client is
configured, an entire file system is on one disk.

There are some questions as to what and when all this happened.  The
backups are normally done from 2 to 4 am every day.  Sunday morning,
around 10, users started to notice something was wrong.  The HP analyst
noticed the failed disk and configured the file system to an unused disk.
 I then issued an ADSM RESTORE, which worked perfectly except it restored
to Saturday's 2 am backup and not Sunday's 2 am backup.  Looking at the
ADSM log I noticed when the ADSM Backup was run on Sunday, it backed-up
several directories (no files) that were found on the root directory that
had the same name as the file system's directories that were lost.  It did
not try to backup the file system, nor did it issued any errors that the
file system was not there.  Thus there was no later version of the backup
to restore (lost a full day, at least).  From the log information, I
concluded that we lost the disk/file system sometime before the Sunday
morning backup, but after the Saturday morning backup.  The HP analyst
says that the file system must have been lost Sunday morning, around 10
am, as users had been doing things with it, including saving files to that
disk/file system, earlier Sunday (around 8 am) and all day Saturday.

I don't know HP-UX (or for that matter any UNIX) very well, but I can
conjecture that when HP-UX lost the disk/file system, it allowed a similar
directory structure to be created on it's ROOT file system transparently
and the application continued to process normally until the application
wanted to retrieve previously existing files that were in the lost
disk/file system.  It is at that time users noticed something was wrong.

When the HP analyst re-issued the mount for the new file system, it made
those directories in the ROOT no longer accessible.

If any one of you think that this may be a valid scenario, please let me
know (perhaps with some ideas as to test/verify this).  Or if I am
completely off-base, please tell me, so in the future I can keep my
big-mouth shut.

Also, should ADSM have told us something was wrong?

Personally, I think ADSM is/was working as designed, in that it backed up
what HP-UX presented to it as valid file systems, but I have to justify my
stance.  They, the people who manage the HP system, think that ADSM should
have detected that the file system was not available and thus should have
reported it as an error (we had no error messages from our normal backup).

Perhaps we can re-structure our backups to be more explicit as to what to
look for and do?  Thoughts?

Thanks!

Mark Mapes
PG&E
<Prev in Thread] Current Thread [Next in Thread>