Veritas-bu

[Veritas-bu] Four backup problems

2001-12-14 13:56:50
Subject: [Veritas-bu] Four backup problems
From: LeeAnne.Pedersen AT gov.ab DOT ca (Lee Anne Pedersen)
Date: Fri, 14 Dec 2001 11:56:50 -0700
I'm very new to Netbackup and the network I'm trying to get it working
on. I'm a couple weeks overdue getting the backups to work, so I'm
hoping someone will be able to point me in the right direction. I've
briefly outlined four problems below. My apologies for such a long
email, but I wasn't sure which information would be most useful.

All servers are Compaq Proliants (various models), Windows 2000 Advanced
Server, with one master/media server and a 2-drive Compaq TLD881 DLX
tape library with DLT 15/30 GB tapes. Servers have either a 10/100 or
Gigabit Ethernet card. NetBackup Datacenter 3.4.1 with NB_341_1 update
is installed on the master, with 3.4.1 (3.4 + J645 patch) installed on
clients. Tape Storage Unit has maximum multiplexing set to 2.

Some of the things I've tried to correct these problems:
- Switching all 10/100 NICs to Full Duplex 100 Mbps
- Set OTM initial cache to 100 and Max cache to 0
- Set OTM error control to Retry
- Changed Busy File Timeout from 60 to 300 seconds
- Changed Busy File Wait from 5 to 4 seconds
- Experimented with using a veritas domain account as the login for the
Netbackup Client Service (seemed to make no difference)
- Added NET_BUFFER_SZ files to master and client servers, with value of
132096
- Setting all logs to VERBOSE/Level 9 (though not much help if you don't
understand what you're seeing)

Problem 1:
I'm continually getting read/write errors on no specific server or tape
or tape drive. As a result, one tape and one drive usually get
frozen/downed per night. I suspect network problems, but I'm not sure
how to troubleshoot it.

I've included excerpts from the logs from one client backup. This class
is for clients with two physical drives, multistreaming, multiplexing
set to 2.  The other class suffers similar problems, does not have
multistreaming, but also has multiplexing = 2.

I do have a third class that appears to be working fine most of the
time, other than taking alot longer than it should. The most obvious
differences that I can see between it and these two that barely work, is
that it writes to a DLT7000 tape and it has multiplexing set to 1.

CDSL2 - bpbkar log
12/13/01 8:35:16 PM: [2596]: INF - OTM Create - initial and max cache
size being set to 1493MB
12/13/01 8:35:16 PM: [2596]: INF -
======================================================================
12/13/01 8:35:16 PM: [2596]: INF - OTM Enable Parameters
12/13/01 8:35:16 PM: [2596]: INF -
----------------------------------------------------------------------
12/13/01 8:35:16 PM: [2596]: INF -            CacheFileName: ''
12/13/01 8:35:16 PM: [2596]: INF -        SizeofCacheFileMB: 1493
12/13/01 8:35:16 PM: [2596]: INF -     MaxSizeOfCacheFileMB: 1493
12/13/01 8:35:16 PM: [2596]: INF -            QuiescentWait: 4
12/13/01 8:35:16 PM: [2596]: INF -         QuiescentTimeout: 300
12/13/01 8:36:48 PM: [2484]: FTL - tar file write error (10053)
12/13/01 8:36:48 PM: [2484]: INF - EXIT STATUS 14: file write failed
12/13/01 8:36:49 PM: [2484]: INF - 'INF - Server status = 14' received
12/13/01 8:36:49 PM: [2484]: INF - OTM Cleanup - success
12/13/01 8:36:49 PM: [2484]: INF - EXIT - status = 0

CDSL2 - All Log Entries
12/13/2001      8:34:51 PM      cdbak01 CDSL2   Info    client CDSL2
handling path C:\
12/13/2001      8:34:52 PM      cdbak01 CDSL2   Info    client CDSL2
handling path E:\
12/13/2001      8:34:53 PM      cdbak01 CDSL2   Info    client CDSL2
handling path System_State:\
12/13/2001      8:34:53 PM      cdbak01 CDSL4   Info    client CDSL4
handling path C:\
12/13/2001      8:34:55 PM      cdbak01 CDSL4   Info    client CDSL4
handling path E:\
12/13/2001      8:34:55 PM      cdbak01 CDSL4   Info    client CDSL4
handling path System_State:\
12/13/2001      8:36:39 PM      cdbak01 CDSL2   Info    begin writing
backup id CDSL2_1008300885, copy 1, fragment 1, to media id DSR093 on
drive index 1
12/13/2001      8:36:39 PM      cdbak01 CDSL2   Info    begin writing
backup id CDSL2_1008300886, copy 1, fragment 1, to media id DSR093 on
drive index 1
12/13/2001      8:36:46 PM      cdbak01 CDSL2   Info    WriteFile failed
with: The I/O bus was reset. (1111); bytes written = 65536; size = 65536
12/13/2001      8:36:46 PM      cdbak01 CDSL2   Error   write of 65536
bytes indicated only 65536 bytes were written, errno = 0, err = 1111
12/13/2001      8:36:48 PM      cdbak01 CDSL2   Critical        from
client CDSL2: FTL - tar file write error (10053)
12/13/2001      8:36:48 PM      cdbak01 CDSL2   Info    media manager -
system error occurred
12/13/2001      8:36:49 PM      cdbak01 CDSL2   Error   backup of client
CDSL2 exited with status 174 (media manager - system error occurred)
12/13/2001      8:36:49 PM      cdbak01 CDSL2   Info    media manager -
system error occurred
12/13/2001      8:36:49 PM      cdbak01 CDSL2   Error   backup of client
CDSL2 exited with status 174 (media manager - system error occurred)

CDBAK01 - bptm log
20:36:45 [320.356] <2> write_backup: media id DSR090 mounted on drive
index 0, drivepath \\.\Tape0, drivename COMPAQDLT80000
20:36:45 [320.356] <2> io_read_media_header: drive index 0, reading
media header, buflen = 32768, buff = 0xda8590
20:36:45 [320.356] <2> io_ioctl: command (5)MTREW 1 from
(../bptm.c.4951) on drive index 0
20:36:46 [2148.1308] <2> write_data: write of 65536 bytes indicated only
65536 bytes were written, err = 1111
20:36:46 [2148.1308] <2> getsockconnected: host=cdbak01 service=bpdbm
address=199.213.112.97 protocol=tcp non-reserved port=13721
20:36:46 [2148.1308] <2> bind_on_port_addr: bound to port 2067
20:36:46 [2148.1308] <2> check_authentication: no authentication
required
20:36:46 [2148.1308] <4> write_data: WriteFile failed with: The I/O bus
was reset. (1111); bytes written = 65536; size = 65536
20:36:46 [2148.1308] <2> getsockconnected: host=cdbak01 service=bpdbm
address=199.213.112.97 protocol=tcp non-reserved port=13721
20:36:46 [2148.1308] <2> bind_on_port_addr: bound to port 2068
20:36:46 [2148.1308] <2> check_authentication: no authentication
required
20:36:46 [2148.1308] <16> write_data: write of 65536 bytes indicated
only 65536 bytes were written, errno = 0, err = 1111
20:36:46 [2148.1308] <2> wait_for_sigcld: waiting for child 1880 to
exit, timeout is 300
20:36:46 [2148.1308] <2> check_error_history: called from bptm line
12312, EXIT_Status = 174
20:36:46 [2148.1308] <2> io_close: closing C:\Program
Files\VERITAS\NetBackup\db\media\tpreq/DSR093, from ../bptm.c.10820
20:36:46 [2148.1308] <2> tpunmount: tpunmount'ing C:\Program
Files\VERITAS\NetBackup\db\media\tpreq/DSR093
20:36:47 [2148.1308] <2> send_brm_msg: ERROR 174
20:36:47 [2148.1308] <2> mpx_terminate_exit: EXITING with status 174

CDSL2 - bpbkar log
12/13/01 8:38:29 PM: [2500]: INF - OTM Create - initial and max cache
size being set to 1642MB
12/13/01 8:38:29 PM: [2500]: INF -
======================================================================
12/13/01 8:38:30 PM: [2500]: INF - OTM Enable Parameters
12/13/01 8:38:30 PM: [2500]: INF -
----------------------------------------------------------------------
12/13/01 8:38:30 PM: [2500]: INF -            CacheFileName: ''
12/13/01 8:38:30 PM: [2500]: INF -        SizeofCacheFileMB: 1642
12/13/01 8:38:31 PM: [2500]: INF -     MaxSizeOfCacheFileMB: 1642
12/13/01 8:38:31 PM: [2500]: INF -            QuiescentWait: 4
12/13/01 8:38:31 PM: [2500]: INF -         QuiescentTimeout: 300
12/13/01 8:39:11 PM: [2484]: FTL - tar file write error (10054)
12/13/01 8:39:11 PM: [2484]: INF - EXIT STATUS 14: file write failed
12/13/01 8:39:12 PM: [2484]: INF - 'INF - Server status = 14' received
12/13/01 8:39:12 PM: [2484]: INF - OTM Cleanup - success
12/13/01 8:39:12 PM: [2484]: INF - EXIT - status = 0

CDSL2 - All log entries
12/13/2001      8:39:16 PM      cdbak01 CDSL2   Info    client CDSL2
handling path C:\
12/13/2001      8:39:17 PM      cdbak01 CDSL2   Info    client CDSL2
handling path E:\
12/13/2001      8:39:17 PM      cdbak01 CDSL2   Info    client CDSL2
handling path System_State:\
12/13/2001      8:39:39 PM      cdbak01         Info    scheduler found
no backups due to run
12/13/2001      8:41:25 PM      cdbak01 CDSL2   Error   read error on
media id DSR093, drive index 1, reading header block, No error
12/13/2001      8:41:25 PM      cdbak01 CDSL2   Critical        from
client CDSL2: FTL - tar file write error (10054)
12/13/2001      8:41:26 PM      cdbak01 CDSL2   Info    media read error
12/13/2001      8:41:26 PM      cdbak01 CDSL2   Error   backup of client
CDSL2 exited with status 85 (media read error)
12/13/2001      8:41:27 PM      cdbak01 CDSL2   Info    media read error
12/13/2001      8:41:27 PM      cdbak01 CDSL2   Error   backup of client
CDSL2 exited with status 85 (media read error)
12/13/2001      8:44:39 PM      cdbak01         Info    media id DSR093
removed from media manager database (expired)

CDBAK01 - bptm log
20:41:18 [148.528] <2> openTpreqFile: GetTapeStatus(\\.\Tape1) failed,
The I/O bus was reset. (1111)
20:41:18 [148.528] <2> io_open: retrying open, The I/O bus was reset.
(1111)
20:41:23 [148.528] <2> io_open: file C:\Program
Files\VERITAS\NetBackup\db\media\tpreq/DSR093 successfully opened
20:41:23 [148.528] <2> write_backup: media id DSR093 mounted on drive
index 1, drivepath \\.\Tape1, drivename COMPAQDLT80001
20:41:23 [148.528] <2> io_read_media_header: drive index 1, reading
media header, buflen = 32768, buff = 0xda8590
20:41:23 [148.528] <2> io_ioctl: command (5)MTREW 1 from
(../bptm.c.4951) on drive index 1
20:41:24 [148.528] <2> io_read_media_header: ReadFile() failed, read
32768 bytes, More data is available. (234)
20:41:24 [148.528] <2> getsockconnected: host=cdbak01 service=bpdbm
address=199.213.112.97 protocol=tcp non-reserved port=13721
20:41:24 [148.528] <2> bind_on_port_addr: bound to port 2162
20:41:24 [148.528] <2> check_authentication: no authentication required
20:41:25 [148.528] <16> io_read_media_header: read error on media id
DSR093, drive index 1, reading header block, No error
20:41:25 [148.528] <2> log_media_error: successfully wrote to error file
- 12/13/01 20:41:25 DSR093 1 READ_ERROR
20:41:25 [148.528] <2> check_error_history: called from bptm line 11587,
EXIT_Status = 85
20:41:25 [148.528] <2> check_error_history: drive index = 1, media id =
DSR093, time = 12/13/01 20:41:25, both_match = 1, media_match = 0,
drive_match = 0
20:41:25 [148.528] <2> io_close: closing C:\Program
Files\VERITAS\NetBackup\db\media\tpreq/DSR093, from ../bptm.c.10820
20:41:25 [148.528] <2> tpunmount: tpunmount'ing C:\Program
Files\VERITAS\NetBackup\db\media\tpreq/DSR093
20:41:25 [148.528] <2> send_brm_msg: ERROR 85
20:41:25 [148.528] <2> mpx_terminate_exit: EXITING with status 85

This client tried unsuccessfully three times within the specified time
period. The following time period it loaded a different tape into the
other drive, but died with the following:

12/14/01 4:01:17 AM: [1768]: INF - can be otm'd: C:\
12/14/01 4:01:17 AM: [1768]: INF - can't be otm'd: System_State:\
12/14/01 4:03:33 AM: [1768]: FTL - tar file write error (10054)
12/14/01 4:03:33 AM: [1768]: INF - EXIT STATUS 14: file write failed
12/14/01 4:03:34 AM: [1768]: INF - 'INF - Server status = 150' received
12/14/01 4:03:34 AM: [1768]: INF - OTM Cleanup - success
12/14/01 4:03:34 AM: [1768]: INF - EXIT - status = 0

12/14/2001      4:03:36 AM      cdbak01 cdsl1   Info    client cdsl1
handling path ALL_LOCAL_DRIVES
12/14/2001      4:03:37 AM      cdbak01 CDSL2   Info    client CDSL2
handling path C:\
12/14/2001      4:04:39 AM      cdbak01         Info    scheduler found
no backups due to run
12/14/2001      4:05:47 AM      cdbak01 CDSL2   Error   read error on
media id DSR090, drive index 0, reading header block, No error
12/14/2001      4:05:48 AM      cdbak01 CDSL2   Warning FREEZING media
id DSR090, it has had at least 3 errors in the last 12 hour(s)
12/14/2001      4:05:48 AM      cdbak01 cdsl1   Critical        from
client cdsl1: FTL - tar file write error (10054)
12/14/2001      4:05:49 AM      cdbak01 cdsl1   Info    media read error
12/14/2001      4:05:49 AM      cdbak01 cdsl1   Error   backup of client
cdsl1 exited with status 85 (media read error)
12/14/2001      4:05:49 AM      cdbak01 CDSL2   Info    media read error
12/14/2001      4:05:50 AM      cdbak01 CDSL2   Error   backup of client
CDSL2 exited with status 85 (media read error)

On a third tape, it finally completely successfully. I've read in a
previous post that with errors of 84/85/86 that the NUMBER_DATA_BUFFERS
should be reduced. Does this apply to Windows 2000?

Problem 2:
Slow backup speed. These tape drives are supposed to have a 6 MB/s
uncompressed transfer rate, but I rarely get above 1 MB/s (though data
is compressed). It helped to turn off the virus scanner on all the
clients, but to get all the servers backed up within the windows, it has
to be a lot faster.

Problem 3:
I will quite often have a backup that runs for hours (doing nothing),
yet there is no tape in the drive?

Problem 4:
On two of my servers, I also consistently get 12/14/01 6:39:18 AM:
[2780]: WRN - GetFileSecurity Failed (WIN32 5: Access is denied. ) Any
suggestions for what might cause it?

My apologies for such a long email, but I'm at wit's end. If any further
information is needed, I'd be happy to provide it.

Thank you for any assistance,
Lee Anne

LeeAnne.Pedersen AT gov.ab DOT ca
Alberta Corporate Service Centre (ACSC)
Ph: 780-427-2504


<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] Four backup problems, Lee Anne Pedersen <=