ADSM-L

ADSM, AIX, 3494, and 3590s

1997-04-11 16:27:04
Subject: ADSM, AIX, 3494, and 3590s
From: "Kent L. Johnson" <johnsk6 AT RPI DOT EDU>
Date: Fri, 11 Apr 1997 16:27:04 -0400
I'm currently running ADSM v2.1.0.12 on an R40 running AIX 4.1.5 ethernet
connected to a 3494 with 3 SCSI-connected 3590 tape drives.

Here are my experiences of today.

1) I have a 'backup db' command scheduled for 10:30 every morning.  Today,
around 1:20 I noticed that the backup command was still waiting for a tape
drive.  At this time, the db log was about 80% full too, since the db is
locked while the backup db command is waiting for a tape.

2) I cancelled the backup command and searched for problems in ADSM.  I
bounced the lmcpd daemon, since sometimes this clears up problems, and ADSM
is able to mount tapes again.  I also started an 'audit library' and tried to
execute the 'backup db' command again.

3) The backup processes was still waiting to mount a tape.  Next I went to
the 3494 and found that one of our 3590 drives was dead.  The 3590 LCD
display was blank.  So, I made the drive unavailable at the 3494, in hopes
that ADSM would detect this, and would be able to continue successfully.

4) The 'backup db' process was still not able to mount a tape.  So, I tried a
'delete drive' in ADSM on the offending tape drive.  I still didn't see any
progress with the 'backup db' process.  So, I tried cancelling the 'audit
library' and the 'backup db' processes.  Now the cancel for these processes
was pending.  I waited a while, and then called software support.  I also
entered a 'show mp' command, and noticed that there were two mount points
shown, even though a 'query mount' showed only one tape mounted.

5) While I was on hold with software support, I tried deleting the device
file (/dev/rmt3) for the offending tape drive.  This seemed to allow the
'cancel process' commands to complete.  After this, ADSM was able to
successfully complete the 'backup db' command.

So, here are some of my questions.

1) Have other people seen 3590 hardware problems cause ADSM to lose the
ability to interact with the 3494 and other functioning tape drives?  I've
seen these types of problems a number of times in the past.

2) Does anybody have any better way to deal with these types of problems?

3) How reliable are other 3590s?  Most of our problems have been with one of
our three 3590 drives.

4) Have these types of problems been reported to ADSM development, so that
they are looking into them?  The ADSM support person with whom I spoke today
gave a vague indication that ADSM development is working on these types of
problems.  Are there any PMRs which development is actively working on?

These problems are serious, because an undetected 'backup db' process which
is unable to complete will lock the database, then the log will fill, then
ADSM will freeze up.  Then loss of backup data, maybe database corruption.
 Then maybe 'restore db', or even 'audit db'.  It gets ugly.

I would love to hear from ADSM development that (1) these problems are known
and solvable and that (2) we can expect to see these problems resolved in a
not-too-distant ADSM release.

Kent


--
Kent Johnson                        Internet: johnsk6 AT rpi DOT edu
Kent Johnson                        Internet: johnsk6 AT rpi DOT edu
Unix Systems Programmer (VCC 323)      Phone: (518) 276-8175
Rensselaer Polytechnic Institute         Fax: (518) 276-2809
<Prev in Thread] Current Thread [Next in Thread>