Bacula-users

[Bacula-users] Bacula suddenly choking on Full backups with Unknown term code

2010-02-21 12:21:53
Subject: [Bacula-users] Bacula suddenly choking on Full backups with Unknown term code
From: Glen Barber <glen.j.barber AT gmail DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Sun, 21 Feb 2010 12:15:01 -0500
Howdy,

I'm running bacula 2.4.3 on FreeBSD which up until recently hasn't been
giving me issues.

I run daily incrementals, weekly differentials, and monthly fulls on
colo-stored clients.  One of these client machines began failing to complete
differential and full backups, with an "Unknown term code" in the email
notification, with the following in the log:

fd JobId 13934: Fatal error: backup.c:892 Network send error to SD. ERR=Broken 
pipe
sd JobId 13934: Job client.2010-02-20_17.43.07 marked to be canceled.
sd JobId 13934: Fatal error: append.c:259 Network error on data channel. 
ERR=Connection reset by peer
sd JobId 13934: Job write elapsed time = 02:58:46, Transfer rate = 1.451 M 
bytes/second
sd JobId 13934: Error: bsock.c:444 Read error from 
client:xxx.xxx.xxx.xxx:36643: 
ERR=Connection reset by peer
dir JobId 13934: Error: Bacula dir 2.4.3 (10Oct08): 20-Feb-2010 21:12:25

In November, I changed the fileset for this client, where a full backup
was scheduled and terminated successfully.  Since the initial full backup
due to the fileset change, there have been two successful full and seven
differentials which terminated successfully.  Incremental backups are not
affected.

I initially began to suspect the network, but the colo switch does not show
errors.  I've already enabled the heartbeat on the client with settings as
low as 15 seconds, with no luck.  I ran the client fd with -d200 to track
the failures, and found the backup was choking on a mbox file.

I created a new job for this particular file, again with -d200, which failed
as well.  Since we are lucky enough that bacula will tell us how many bytes
were transferred, I was able to use dd(1) to get to that point in the file,
where I found the contents to be a base64-encoded PDF.  Removing that email,
a subsequent backup of that file was successful.  A backup of only that
one email resulted in a failure.

file(1) output of the mbox file shows: ASCII mail text, with very long lines

I moved the mbox to a location which does not get backed up, and found the
same failure on another user's mbox with the same file(1) output.

Have I hit a bug?  I know 2.4.3 is rather old, and an upgrade is in the near
future, but I'd hate to upgrade to find the same problem, so I'm hoping this
is something someone has seen before.

-- 
Glen Barber

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>