Veritas-bu

Re: [Veritas-bu] Error 41 on Catalog Backup using NBU 6.0MP4?

2007-08-28 13:13:46
Subject: Re: [Veritas-bu] Error 41 on Catalog Backup using NBU 6.0MP4?
From: "Preston, Douglas L" <dpreston AT landam DOT com>
To: "Justin Piszcz" <jpiszcz AT lucidpixels DOT com>
Date: Tue, 28 Aug 2007 12:47:03 -0400
The changes I made

It is found in windows gui 
Under
Host Properties
Master Servers
My_Master
Client connect Timeout I have 900 Seconds       
Client Read Timeout 3600 Seconds
File Browse Timeout  3600 Seconds  
Media Server Connect Timeout 600Seconds



Clients
My_Master
Windows Client
Client Settings

Communications buffer size   Set to 32



------------------------------------------------------------------------
------

>>From gui help on status codes.

NetBackup status code: 41 
Message: network connection timed out 

Explanation: The server did not receive any information from the client
for too long a period of time. 

Recommended action: 

On UNIX or Windows clients, check for the following problems with the
bpbkar client process.
On Windows clients: The bpbkar client process is not hung. Due to the
files and directories it scans, it has not replied to the server within
the Client read timeout or Client connect timeout period. This error
occurs during incremental backups when directories have thousands of
unmodified files. 

For this case, use Host Properties on the NetBackup server to change
Client connect timeout or Client read timeout. These settings are on the
Timeouts and Universal Settings tabs, respectively, in the Master Server
Properties dialog box. The default for these timeouts is 300 seconds. 

See "Using the Host Properties Window" in the Troubleshooting Guide). 

You can also monitor CPU utilization to determine if this condition
exists. 

On UNIX clients: 

*The bpbkar client process is hung on a file that has mandatory locking
set. For this case, add the following to the client's bp.conf file: 

VERBOSE 
and as root on the client run the following: 

touch /usr/openv/netbackup/bpbkar_path_tr 
 /usr/openv/netbackup/logs/bpbkar 
Then retry the operation. The names of the files are logged in the debug
log file in the /usr/openv/netbackup/logs/bpbkar directory before bpbkar
processes them. The last file in the log is the file that causes
problems. 

Note: Also, use these procedures for other "unknown" bpbkar hangs. 
If the problem is due to mandatory file locking, have NetBackup skip the
locked files. Set LOCKED_FILE_ACTION to SKIP in the
/usr/openv/netbackup/bp.conf file on the client. 

*The bpbkar client process is not hung. Due to the files and directories
it scans, it has not replied to the server within CLIENT_READ_TIMEOUT or
CLIENT_CONNECT_TIMEOUT. This error occurs during backups when
directories have thousands of unmodified files or during restores of the
sparse files that have thousands of holes. It also occurs when it backs
up file systems or the directories that reside on optical disk, which is
considerably slower than magnetic disk. 

For this case, try to add or modify the CLIENT_READ_TIMEOUT and
CLIENT_CONNECT_TIMEOUT values in the server's
/usr/openv/netbackup/bp.conf file. The default for the
CLIENT_READ_TIMEOUT and CLIENT_CONNECT_TIMEOUT is 300 seconds if it is
not specified. 

Use your system's ps command and monitor CPU utilization to help decide
which of these conditions exist. 

When you finish the investigation of the problem, delete the
/usr/openv/netbackup/logs/bpbkar directory, since the log files can
become quite large and are not deleted automatically. Also delete
/usr/openv/netbackup/bpbkar_path_tr so you do not generate larger log
files than needed the next time you create directory
/usr/openv/netbackup/logs/bpbkar. 

On Windows systems, try the following:
*Disable the following file: 

install_path\VERITAS\NetBackup\bin\tracker.exe 
*Repair hard drive fragmentation. Try an application that is called
Diskeeper Lite, which is part of the Windows Resource Kit. 

*Make sure that enough space is available in \temp. 

If the server cannot connect to the client, do the following: create
bpcd or bpbkar (UNIX and Windows only) debug log directories on the
client. Then retry the operation and check the resulting logs. If these
logs do not provide a clue, create a bpbrm debug log on the server. Then
retry the operation and check the resulting debug log.
If the bpbrm log has entries similar to the following, the problem is in
the routing configuration on the server: 

bpbrm hookup_timeout: timed out waiting during the client hookup 
bpbrm Exit: client backup EXIT STATUS 41: network connection 
timed out 
Verify that the client IP address is correct in the name service that is
used. On UNIX, if both the NIS and the DNS files are used, verify that
they match. 

Also, see "Resolving Network Communications Problems" in the
Troubleshooting Guide. 

If you use an AIX token ring adapter and the routed daemon is running,
the timeout occurs because the token ring adapter creates dynamic
routes. It then causes the routed daemon to crash.
For a FlashBackup client, this error occurs if the file system being
backed up is very large and has a very large number of files. It can
also occur if a large number of concurrent data streams are active at
the same time. To correct it, add CLIENT_READ_TIMEOUT to the
/usr/openv/netbackup/bp.conf file and set it to increase the timeout
interval.
Make sure all recommended NetBackup patches are installed. Check the
Symantec support Web site for current patch information. (Go to
www.support.veritas.com. Then select "NetBackup" followed by "files and
updates".) 
Run the NetBackup Configuration Validation Utility (NCVU) for the
associated NetBackup nodes. Note the pack checks in section two. 

Add the CLIENT_READ_TIMEOUT values to the master server, media server,
and client when a NetBackup database extension product is installed. The
values should all be the same for each server. The value set is
dependent on the size of the database being backed up. 
See the NetBackup Administrator's Guide, Volume II, for more information
on CLIENT_READ_TIMEOUT. 

Make sure enhanced authentication is configured correctly. For example,
the following may result in status code 41: host A is configured to use
enhanced authentication with host B, but host B is not configured to use
enhanced authentication with host A. In this case, connections from host
B to host A are likely to fail with status code 41. Connections from
host A to B are likely to fail with authentication errors (status code
160).



Doug Preston
Systems Engineer
Land America Tax and Flood Services
Phone 626-339-5221 Ext 1104
Email  dlpreston AT landam DOT com


------------------------------------------------------------------------
------------
NOTICE: This electronic mail transmission may constitute a communication
that is legally privileged. It is not intended for transmission to, or
receipt by, any unauthorized persons. If you have received this
electronic mail transmission in error, please delete it from your system
without copying it, and notify the sender by reply e-mail, so that our
address record can be corrected.
------------------------------------------------------------------------
------------




-----Original Message-----
From: Justin Piszcz [mailto:jpiszcz AT lucidpixels DOT com] 
Sent: Tuesday, August 28, 2007 9:08 AM
To: Preston, Douglas L
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Error 41 on Catalog Backup using NBU 6.0MP4?



On Tue, 28 Aug 2007, Preston, Douglas L wrote:

> Try increasing your network timeouts.  Since I added 400 day retention

> to my catalogs and log files (Blame The Auditors) I have had to 
> increase my timeouts at least twice in the last year.
>
> I know it sounds funny to need to increase them on the master server 
> when the master server is backing up the catalog but it made a 
> difference.

What specific tunables did you change?

Thanks,

Justin.


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu