Bacula-users

Re: [Bacula-users] Timeout (?) problems with some Full backups

2009-08-12 16:24:59
Subject: Re: [Bacula-users] Timeout (?) problems with some Full backups
From: John Lockard <jlockard AT umich DOT edu>
To: Nick Lock <nick.lock AT exa-networks.co DOT uk>
Date: Wed, 12 Aug 2009 16:20:44 -0400
While the job is running, keep an eye on the system which houses
your MySQL database and make sure that it isn't filling up a
partition with temp data.  I was running into a similar problem
and needed to move my mysql_tmpdir (definable in /etc/my.cnf)
to another location.

-John

On Wed, Aug 12, 2009 at 05:00:30PM +0100, Nick Lock wrote:
> Hello list!
> 
> Sorry to trouble you with what's probably a simple problem, but I'm now
> looking at the very real possibility of wiping all our backups clean and
> starting from scratch if I can't fix it... :(
> 
> I'm having problems with some Full backups, which run for between 1 and
> 2 hours, appearing to "time out" after the data transfer from the FD to
> the SD. The error message (shown below) shows that the data transfer
> completes, often in about 1hr30min, and then Bacula does nothing until
> the job has been running for 2 hours at which point it gives an FD
> error.
> 
> Other Full backups (which don't take as long) run correctly, and for
> most of the time Inc and Diff backups also run correctly. However, a
> small % of backups will fail at random, also with FD errors but at
> random times-elapsed during the job... this I have been ascribing to
> network fluctuations! The difference is that re-running these random
> failures will succeed, whilst this particular Full failure doesn't! ;)
> 
> I've already tried setting a heartbeat interval of 20 minutes in the
> FD/SD and DIR conf files (thinking that the FD -> Dir connection was
> timing out) but this doesn't change anything.
> 
> In the time between the data transfer finishing and the timeout,
> Postgres has an open connection with a "COPY batch FROM STDIN"
> transaction in progress, which at the timeout produces errors in the
> Postgres log that I have also shown below.
> 
> I'm happy to post portions of the conf files if needed, but they're huge
> and might well lead to tl;dr!
> 
> Any suggestions as to how I can troubleshoot this further would be most
> appreciated!
> 
> Nick Lock.
> 
> 
> ---------------------------------------------------------------------
> 12-Aug 14:18 exa-bacula-dir JobId 5514: Start Backup JobId 5514,
> Job=backup_scavenger.2009-08-12_14.18.06.04
> 12-Aug 14:18 exa-bacula-dir JobId 5514: There are no more Jobs
> associated with Volume "scavenger-full-1250". Marking it purged.
> 12-Aug 14:18 exa-bacula-dir JobId 5514: All records pruned from Volume
> "scavenger-full-1250"; marking it "Purged"
> 12-Aug 14:18 exa-bacula-dir JobId 5514: Recycled volume
> "scavenger-full-1250"
> 12-Aug 14:18 exa-bacula-dir JobId 5514: Using Device
> "FileStorageScavenger"
> 12-Aug 14:18 exa-bacula-sd JobId 5514: Recycled volume
> "scavenger-full-1250" on device
> "FileStorageScavenger" (/srv/bacula/volume/web-scavenger), all previous
> data lost.
> 12-Aug 14:18 exa-bacula-dir JobId 5514: Max Volume jobs exceeded.
> Marking Volume "scavenger-full-1250" as Used.
> 12-Aug 15:49 exa-bacula-sd JobId 5514: Job write elapsed time =
> 01:31:41, Transfer rate = 401.4 K bytes/second
> 12-Aug 16:18 exa-bacula-dir JobId 5514: Fatal error: Network error with
> FD during Backup: ERR=Connection reset by peer
> 12-Aug 16:18 exa-bacula-dir JobId 5514: Fatal error: No Job status
> returned from FD.
> 12-Aug 16:18 exa-bacula-dir JobId 5514: Error: Bacula exa-bacula-dir
> 2.4.4 (28Dec08): 12-Aug-2009 16:18:09
>   Build OS:               x86_64-pc-linux-gnu debian lenny/sid
>   JobId:                  5514
>   Job:                    backup_scavenger.2009-08-12_14.18.06.04
>   Backup Level:           Full
>   Client:                 "scavenger" 2.4.4 (28Dec08)
> i486-pc-linux-gnu,debian,5.0
>   FileSet:                "full-scavenger" 2009-04-16 15:58:05
>   Pool:                   "scavenger-full" (From Job FullPool override)
>   Storage:                "FileScavenger" (From Job resource)
>   Scheduled time:         12-Aug-2009 14:18:03
>   Start time:             12-Aug-2009 14:18:09
>   End time:               12-Aug-2009 16:18:09
>   Elapsed time:           2 hours 
>   Priority:               10
>   FD Files Written:       0
>   SD Files Written:       81,883
>   FD Bytes Written:       0 (0 B)
>   SD Bytes Written:       2,208,578,175 (2.208 GB)
>   Rate:                   0.0 KB/s
>   Software Compression:   None
>   VSS:                    no
>   Storage Encryption:     no
>   Volume name(s):         scavenger-full-1250
>   Volume Session Id:      5
>   Volume Session Time:    1250080970
>   Last Volume Bytes:      2,212,857,316 (2.212 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  Error
>   SD termination status:  OK
>   Termination:            *** Backup Error ***
> 
> ---------------------------------------------------------------------
> Postgres Log:
> 
> 2009-08-12 16:18:09 BST ERROR:  unexpected message type 0x58 during COPY
> from stdin
> 2009-08-12 16:18:09 BST CONTEXT:  COPY batch, line 81884: ""
> 2009-08-12 16:18:09 BST STATEMENT:  COPY batch FROM STDIN
> 2009-08-12 16:18:09 BST LOG:  could not send data to client: Broken pipe
> 2009-08-12 16:18:09 BST LOG:  could not receive data from client:
> Connection reset by peer
> 2009-08-12 16:18:09 BST LOG:  unexpected EOF on client connection
> 
> 
> 
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with 
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 
> 

-- 
"Without friction there's no heat, without heat there can't
 be fire, without fire there's no desire, you're making me
 hot-too-hot-too-hot-hot-too-hot-too-hot-OWWwwwww!" - Oingo Boingo
-------------------------------------------------------------------
         John M. Lockard |  U of Michigan - School of Information
 Unix and Security Admin |      1214 SI North - 1075 Beal Ave.
      jlockard AT umich DOT edu |        Ann Arbor, MI  48109-2112
 www.umich.edu/~jlockard |     734-615-8776 | 734-647-8045 FAX
-------------------------------------------------------------------

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users