Actually, I’ve done that before with
the same results. We upped the timeouts to around 10,000 seconds to no avail.
It’s as though at some point the backups just hang for no good reason.
A quick find shows that I’m backing
up roughly 29,000 files – that shouldn’t take too long to
enumerate, should it?
From: Liddle, Stuart
[mailto:liddles AT amgen DOT com]
Sent: Monday, July 09, 2007 11:14
AM
To: Aaron Mills;
veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] same job
keeps hanging.
So, are you trying to back up a filesystem
with lots and lots of small files? If so, remember that NetBackup will
try to enumerate all of the files that you are trying to back up. We had
a similar situation where we were trying to back up a filesystem with 3.5
million files in 50,000 directories. It took hours to do a filelist of
all of that….consequently, it timed out.
Symantec told us the best solution for
that particular directory was NDMP (since the timeouts are much longer).
OR…I suppose you could up the
timeout value to more than 3600 seconds and see what happens.
From:
veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of Aaron Mills
Sent: Monday, July 09, 2007 9:58
AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] same job
keeps hanging.
Hi all,
I’m hoping someone’s seen this before. I’m
running 5.1MP6 w/ AIT3 – I’ve got a ~126GB backup that kicks off
weekly, but hangs within a few hours every time – the error I get is
always “media manager terminated by parent process” but the logs
don’t seem to show anything odd. No other backups hang like this. This is
also the only job that runs on the server itself.
bptm gives me:
03:28:45.470 [4999] <2> io_ioctl:
command (1)MTFSF 1 from (bptm.c.8307) on drive index 1
03:28:45.530 [4999] <2> io_close:
closing /usr/openv/netbackup/db/media/tpreq/AK6503, from bptm.c.8310
03:28:45.530 [4999] <2>
catch_signal: EXITING with status 82
so I check bpbrm:
02:05:33.882 [4992] <2> bpbrm
spawn_child: /usr/openv/netbackup/bin/bptm bptm -w -c foo.bar.com -den 17 -rt 6
-rn 0 -stunit Spectra2 -cl inbound -bt 1183968330 -b foo.bar.com _1183968330
-st 0 -cj 1 -p inbound -hostname foo.bar.com -ru root -rclnt foo.bar.com
-rclnthostname foo.bar.com -rl 5 -rp 8035200 -sl ftpif -ct 0 -maxfrag 1048576
-tir -v -Z –mediasvr foo.bar.com -jobid 117926 -jobgrpid 117926
-masterversion 510000 -shm
02:05:33.884 [4992] <2> bpbrm
write_continue_backup: wrote CONTINUE BACKUP on COMM_SOCK <4>
02:05:33.884 [4992] <2> bpbrm main:
wrote /na270/pub/inbound on COMM_SOCK
02:05:33.884 [4992] <2> bpbrm main:
wrote /na270/pub/ftp on COMM_SOCK
02:05:33.884 [4992] <2> bpbrm main:
wrote CONTINUE on COMM_SOCK
02:05:33.885 [4992] <2> bpbrm main:
ESTIMATE -1 -1 nbu0 foo.bar.com _1183968330
02:09:44.763 [4992] <2> bpbrm
mm_sig: received ready signal from media manager
02:09:44.763 [4992] <2> bpbrm
readline: retrying partial read from fgets ::
03:27:22.261 [4992] <2> bpbrm
sighandler: signal 14 caught by bpbrm
03:27:22.272 [4992] <2> bpbrm sighandler: bpbrm
timeout after 3600 seconds
03:27:22.287 [4992] <2>
clear_held_signals: clearing signal mask stack, mask_stack_depth = 0
03:27:22.287 [4992] <2> bpbrm
kill_child_process: start
03:27:22.287 [4992] <2> bpbrm
wait_for_child: start
03:28:48.546 [4992] <2> bpbrm
wait_for_child: child exit_status = 82 signal_status = 0
03:28:48.557 [4992] <2>
inform_client_of_status: INF - Server status = 41
but I can’t seem to figure out why there was a
timeout. I checked all the related logs – bpbkar just shows file writing
stopping at 2:42am – like the process just hangs there, no errors though.
Looking right now, the bpbrm and bpbkar processes for this backup are still
running, but nothing is happening. The job shows as active and everything is queueing
up behind it. I’ve also adjusted the CLIENT_READ_TIMEOUT in
/usr/openv/netbackup/bp.conf to no avail.
Can anyone point me in the right direction as to what
I’m missing? I’m guessing there’s something I’m not
seeing in one of the logs.
-Aaron
Aaron Mills
Systems Administrator
Return Path, Inc.
http://www.returnpath.net
aaron.mills AT returnpath DOT net