I have bacula 5.2.10 installed on a RHEL 6 server and it has been running fine but recently we have bumped in to a problem. I am backing up our data server which is about 26TB. I started a Full backup up of this machine and the backup ran for 6 days and then the process is killed by Watchdog. Here is the information I got from the bconsole:
0-Oct 16:41 lindy-sd JobId 2458: User specified spool size reached.
10-Oct 16:41 lindy-sd JobId 2458: Writing spooled data to Volume. Despooling 966,367,832,548 bytes ...
10-Oct 16:43 lindy-dir JobId 2458: Error: Watchdog sending kill after 518406 secs to thread stalled reading File daemon.
10-Oct 16:43 lindy-dir JobId 2458: Fatal error: Network error with FD during Backup: ERR=Interrupted system call
10-Oct 16:43 lindy-sd JobId 2458: Fatal error: spool.c:301 Fatal append error on device "Drive-1" (/dev/nst0): ERR=
10-Oct 16:43 lindy-dir JobId 2458: Fatal error: No Job status returned from FD.
10-Oct 16:43 lindy-dir JobId 2458: Error: Bacula lindy-dir 5.2.10 (28Jun12):
Build OS: x86_64-unknown-linux-gnu redhat Enterprise release
I read about “Max Run Time = time” directive that could be set in the bacula config file. I also read that By default, the watchdog thread will kill any Job that has run more than 6 days. The maximum watchdog timeout is independent of MaxRunTime and cannot be changed??
I am not sure if I should set this directive in my bacula config file? Has anybody encountered this issue if so how did you solve this problem?
I would appreciate your help.
Thank you.
Uthra