Veritas-bu

[Veritas-bu] RMAN and Netbackup timeout issues

2009-03-05 10:08:09
Subject: [Veritas-bu] RMAN and Netbackup timeout issues
From: Roy McMorran <mcmorran AT mdibl DOT org>
To: veritas-bu AT mailman.eng.auburn DOT edu
Date: Thu, 05 Mar 2009 09:56:11 -0500
Hi all,

I'm hoping to gain some insight into a timeout problem I'm seeing with 
my Oracle backups.  This issue only occurs with incremental backups; 
fulls are working fine.

I run weekly fulls and nightly incrementals on a ~1TB Oracle database.  
These are hot backups using RMAN and the NBU Oracle agent.

Specifics:
Solaris 10, up to date on patches
Oracle 11.1.0.7
Netbackup 6.0 MP5 on client and master/media server
Backing up to tape (LTO3)

The nightly incremental backups are failing.  They run for about 3 
hours, then fail with a 41 error.  Then retry, run for about 3 hours and 
then error 41 again.  Then the job fails with an error 6.

This is a large database and (at times) very static.  I surmise that the 
incremental backup is indeed running for 3 hours without finding any 
changed blocks to write to tape, thus triggering the timeout.  And 
indeed the weekly full backup always runs to completion without any 
problem (takes about 7 hours).

But where is that 3 hour timeout coming from?  Not bp.conf; no timeout 
was specified on client or master/media server.

 From the bpbrm log on the master/media server:
22:02:11.428 [22814] <2> bpbrm handle_backup: from client mella: change 
timeout to 10800
...
01:37:52.726 [22814] <2> bpbrm sighandler: signal 14 caught by bpbrm
01:37:52.726 [22814] <2> bpbrm sighandler: bpbrm timeout after 10800 seconds
01:37:52.727 [22814] <2> clear_held_signals: clearing signal mask stack, 
mask_stack_depth = 0
01:37:52.744 [22789] <2> bpbrm brm_sigcld: bpbrm child 22814 exit_status 
= 41, signal_status = 0
01:37:52.744 [22789] <2> bpbrm brm_sigcld: child 22814 exited with 
status 41: network connection timed out

That first line is interesting.  OK, 10800 seconds = 3 hours.  I think 
this is CLIENT_READ_TIMEOUT?

So I've tried CLIENT_READ_TIMEOUT=22800 (8 hours) in 
/home/oracle/bp.conf and/or /usr/openv/netbackup/bp.conf (on the client) 
but these settings had no effect.

This was interesting:
$ strings /usr/openv/netbackup/bin/libobk.so64.1
...
NBBSA_CLIENT_READ_TIMEOUT
10800
...

Is it hard coded to be 3 hours in libobk?

Any thoughts?  Thanks and best wishes,

-- 
Roy McMorran
Systems Administrator
MDI Biological Laboratory
mcmorran AT mdibl DOT org


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>