Hi,
I recently upraded to Netbackup 4.5 Datacenter on our Solaris server. We
currently have three old Exabyte EZ17 autoloaders connected up to the system.
Since I activated the scheduals, I keep running into a problem where during the
backups the tape drives error out and they get downed. As a result, I only get
a
few of my servers backed up, while the rest fail because none of the tape
drives
are available. I've cleaned the tape drives, and that was not the problem. The
tape is brand new. Here are some of the error messages I see in the
/var/adm/messages file:
Dec 19 19:05:49 hgsun27 scsi: [ID 243001 kern.info]
/sbus@1f,0/QLGC,isp@3,10000/st@0,0 (st21):
Dec 19 19:05:49 hgsun27 Fixed record length (1024 byte blocks) I/O
Dec 19 19:08:21 hgsun27 bptm[1109]: [ID 557619 daemon.error] Application
(NetBackup) has DOWN'ed drive index 2, see application error log for further
information
This is what I see in the bptm log file:
19:08:20.547 [1109] <2> log_media_error: successfully wrote to error file -
12/19/02 19:08:20 C02503 2 WRITE_ERROR
TIR file, size is 264810 bytes + 0 GB
19:08:20.504 [1109] <2> write_data_tir: absolute block position prior to
writing
backup header(s) is 36, copy 1
19:08:20.504 [1109] <2> write_data_tir: block position check: actual 36,
expected 5
19:08:20.528 [1109] <16> write_data_tir: FREEZING media id C02503, too many
data
blocks written, check tape/driver block size configuration
19:08:20.547 [1109] <2> log_media_error: successfully wrote to error file -
12/19/02 19:08:20 C02503 2 WRITE_ERROR
19:08:20.564 [1109] <2> check_error_history: called from bptm line 15872,
EXIT_Status = 84
19:08:21.070 [1109] <2> check_error_history: drive index = 2, media id =
C02503,
time = 12/19/02 19:08:20, both_match = 0, media_match = 0, drive_match = 2
19:08:21.070 [1109] <2> io_close: closing
/usr/openv/netbackup/db/media/tpreq/C02503, from bptm.c.12711
19:08:21.071 [1109] <2> tpunmount: tpunmount'ing
/usr/openv/netbackup/db/media/tpreq/C02503
19:08:21.071 [1109] <2> TpUnmountWrapper: SCSI RELEASE
19:08:21.118 [1109] <8> check_error_history: DOWN'ing drive index 2, it has had
at least 3 errors in last 12 hour(s)
19:08:21.120 [1109] <2> bptm: EXITING with status 84 <----------
I see in these error messages that there seems to be a problem with the block
size. Also, for the backups that do run before the drives get downed, they take
a lot longer than they use to. I have a full backup that I started three days
ago for a test system.. it's still running! Normally, I can do a full backup of
all of my 25 systems within 7 hours. Nothing has changed hardware or network
wise. What can I do to fix this problem?
Thanks in advance for any help!
******************************************************
* Octave J. Orgeron * Specializing in : *
* Unix Systems Administrator * Solaris/Tru64/Linux *
* The Hibbert Group * Certified Solaris *
* oorgeron AT hibbertgroup DOT com * Systems Administrator *
******************************************************
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
www.mimesweeper.com
**********************************************************************
|