Veritas-bu

[Veritas-bu] update on drives being marked down/missing

2002-05-22 10:37:59
Subject: [Veritas-bu] update on drives being marked down/missing
From: danix AT cloud9 DOT net (danix)
Date: Wed, 22 May 2002 10:37:59 -0400 (EDT)
Thanks for all the suggestions I received.
So far, here is where we are at:

- yesterday we verified all drives are visible with probe-scsi at the boot
prompt
- we detected some problems on one of the drives in the L40, and since we were
going to decommission it anyway, we unplugged it from the system.
- we rebooted, and wrote to each tape drive in the 9710 using tar/mt at the
command line, and also via netbackup, except for one drive which had a known
hardware problem, and was marked down.

Convinced that things looked OK and perhaps some scsi weirdness had been 
happening via the L40, we left for the day and expected things to run well.

End result:
May 21 18:28:31 backup02 bptm[3555]: [ID 557625 daemon.error] Application 
(NetBackup) has DOWN'ed drive index 5, see application error log for further 
information
May 21 19:28:44 backup02 bptm[8308]: [ID 557621 daemon.error] Application 
(NetBackup) has DOWN'ed drive index 3, see application error log for further 
information
May 21 19:36:38 backup02 bptm[8784]: [ID 557619 daemon.error] Application 
(NetBackup) has DOWN'ed drive index 2, see application error log for further 
information
May 21 20:06:30 backup02 bptm[10104]: [ID 557615 daemon.error] Application 
(NetBackup) has DOWN'ed drive index 0, see application error log for further 
information

Only one drive was still running this morning.

If anyone has pointers on where to find the "application error log" I would 
appreciate it.  The above errors are from syslog.
I looked at the logs in /opt/openv/netbackup/db/error, but I can't seem to 
figure out the syntax.  The following seem to match the above syslog entries:
1022020111 1 132 8 backup02 114394 0 0 wapcuwp01bb0006 bptm DOWN'ing drive 
index 5, it has had at least 3 errors in last 12 hour(s)
1022023724 1 132 8 backup02 114397 0 0 20app1 bptm DOWN'ing drive index 3, it 
has had at least 4 errors in last 12 hour(s)
1022024198 1 132 8 backup02 114399 0 0 wapcuse01aa0003 bptm DOWN'ing drive 
index 2, it has had at least 3 errors in last 12 hour(s)
1022025990 1 132 8 backup02 114402 0 0 wapcuwp01dd0001 bptm DOWN'ing drive 
index 0, it has had at least 4 errors in last 12 hour(s)

The first error (drive index 5) is preceded by this:
1022020110 1 132 16 backup02 114394 0 0 wapcuwp01bb0006 bptm read error on media
 id 000141, drive index 5, reading header block, I/O error

There are also several
EXIT STATUS 85 (media read error)

contained in the file.

These are pretty much new/recent tapes, so I don't see how we could be having
errors on all of them.

Comments, suggestions welcome.


<Prev in Thread] Current Thread [Next in Thread>