Veritas-bu

[Veritas-bu] Behvoural question

2001-04-19 10:30:33
Subject: [Veritas-bu] Behvoural question
From: MarelP AT AUSTRALIA.Stortek DOT com (Marelas, Peter)
Date: Fri, 20 Apr 2001 00:30:33 +1000
Sure, but im assuming your master/media/clients are running Solaris here.

Increase file descriptor limit on master/media to 1024 in /etc/system.

set rlim_fd_max = 1024
set rlim_fd_cur = 1024

Reduce the amount of time solaris waits to free a socket after close.
Do this on master/media. Client is probably not as important.
A good read on this is http://www.sean.de/Solaris/tune.html
ndd -set /dev/tcp tcp_time_wait_interval 10000 (10 seconds)

Increase the network buffer size on clients/master/media servers.
Add a value to the file below, by defaults its 32k, i would start with 64k
so
add 65536 (its in bytes).
A good read is here http://seer.support.veritas.com/docs/183702.htm
/usr/openv/netbackup/NET_BUFFER_SZ

If you havent tuned NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS
do so, you should get a suprising performance increase assuming you have
resources to spare. If you have tuned SIZE_DATA_BUFFERS I would look to
use the same value in NET_BUFFER_SZ. If you have STK 9840 drives Ive
experienced (17-37MB in SAN) very good performance by using 262144.
For NUMBER_DATA_BUFFERS I use minimum 64 and max 160 depending
on the resources available and number of concurrent drives in use.
Experiment.

Run netstat -s periodically and look for tcpListenDrop. If the value is not
0 it
means connection attempts (doesnt necessarily mean to netbackup) are
being refused because programs network listen queues are full. The value
resets on
reboot, so you could check it now and then every morning after 4x errors.
Follow the previous solaris network tuning URL i referenced on settings for
tcp_conn_req_max_q and
tcp_conn_req_max_q0, however it does not mean Netbackup will take advantage
of an increase in these values (i.e. I havent verified if it does).

In terms of logs, read /usr/openv/netbackup/logs/README.debug and create the
necessary directories to enable detailed logging. You want to focus on logs
against bprd, bpsched (master), bpbrm, bptm, bpcd.bpbackup.
Also add VERBOSE to bp.conf on master/media/client.

If you use software compression, disable it and see if it makes a
difference.
Software compression will only delay transmission of data from the client to
the media server.

Increase these variables on media/master/client in bp.conf. They are self
explanatory and in seconds.

CLIENT_CONNECT_TIMEOUT = 1200
CLIENT_READ_TIMEOUT = 1200

If you have a firewall between the media server and client, its most likely
a problem
depending on the policies.

Skipped locked files, put LOCKED_FILE_ACTION = SKIP in bp.conf on client.
More likely to be a problem on NT.

Use the same version and patches levels on client/media/master.

Most important, look for trends. Does it happen to one client more than
another.
Does it happen to clients that use a particular media server. Does it happen
at
a certain time of day. Trend analysis will indicate where to focus your
efforts.

Lastly, if your really lost, snoop all network traffic between the client
and media server,
and the client and master server. Attempt to capure a backup that succeeded
and
a backup that fails with a 4x error. Compare the traces. You should get an
idea then, of
what the "normal" process is, and where the 4x error fails in the process,
which may
lead you to the culprit, or a dependancy inbetween.

Regards
Peter Marelas


> -----Original Message-----
> From: Dennis Dwyer [SMTP:dfdwyer AT tecoenergy DOT com]
> Sent: Thursday, 19 April 2001 9:59 PM
> To:   Marelas, Peter; veritas-bu AT mailman.eng.auburn DOT edu; AhrensJ AT 
> psi DOT ca
> Subject:      RE: [Veritas-bu] Behvoural question
> 
> Could you give us some hints on what to look for in the logs and what kind
> of tuning that might be? I too have a UNIX environment and when I use the
> procedures defined in the Troubleshooting Guide to resolve the 4x status
> codes, everything comes up normal.
> 
> Regards,
> Dennis
> 
> Quote: "Time is not a test of the truth"
> Translation: Just because you've always done it that way, doesn't make it
> right
> 
> Dennis F. Dwyer
> Enterprise Storage Manager
> Tampa Electric Company
> 
> (813) 225-5181  - Voice
> (813) 275-3599  - FAX
> 
> Visit our corporate website at www.tecoenergy.com
> 
> >>> "Marelas, Peter" <MarelP AT australia.stortek DOT com> 04/18/01 10:03PM 
> >>> >>>
> We see these as well in Unix environments.
> 
> I would check the logs on the master/media servers at the time the error
> occured (/usr/openv/netbackup/logs).
> 
> It may well be some network tuning is required to cater for more
> concurrent
> clients.
> 
> Regards
> Peter Marelas
> 
> -----Original Message-----
> From: Jason Ahrens [ahrensj AT psi DOT ca] [mailto:AhrensJ AT psi DOT ca] 
> Sent: Thursday, 19 April 2001 5:57 AM
> To: Veritas BU
> Subject: [Veritas-bu] Behvoural question
> 
> 
> I imagine our setup here is somewhat 'typical' for a large environemnt.
> 100Meg systems feeding 1Gig networks to a L700 tape silo.
> 
> A few times a week, I have NetBackup fail a backup with an error 4x code
> (some kind of transient network error). These always seem to be on NT
> machines (proportionally, there are more NT than Unix, so it might not be
> directly related).
> 
> Looking at the network itself, I cannot find any switches reporting errors
> between the client and the server. Netbackup On sebsequent attempts,
> NetBackup is successfull in completing the backup.
> 
> Has anyone else seen this behaviour?
> 
> Jason
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
> 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>