Bacula-users

[Bacula-users] SD Bug "User specified spool size reached"

2012-05-11 08:29:26
Subject: [Bacula-users] SD Bug "User specified spool size reached"
From: Tilman Schmidt <t.schmidt AT phoenixsoftware DOT de>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 11 May 2012 14:26:31 +0200
On a Bacula installation with a CentOS 6.2 (64 bit) server
running Bacula 5.2.6 director and SD with an HP LTO3 autoloader
and backing up some twenty-odd machines of various platforms,
all the jobs report "User specified spool size reached"
much too early, currently after 1.45 GB instead of the really
specified size of 50 GB. Example log excerpt:

10-Mai 22:09 bacula-sd JobId 2117: Spooling data ...
10-Mai 22:09 bacula-sd JobId 2117: User specified spool size reached.
10-Mai 22:09 bacula-sd JobId 2117: Writing spooled data to Volume.
Despooling 1,450,176,877 bytes ...
10-Mai 22:10 bacula-sd JobId 2117: Despooling elapsed time = 00:00:22,
Transfer rate = 65.91 M Bytes/second
10-Mai 22:10 bacula-sd JobId 2117: Spooling data again ...
10-Mai 22:10 bacula-sd JobId 2117: User specified spool size reached.
10-Mai 22:10 bacula-sd JobId 2117: Writing spooled data to Volume.
Despooling 1,450,176,900 bytes ...

The relevant device resource from /etc/bacula/bacula-sd.conf is:

Device {
  Name = LTO-3
  Media Type = LTO-3
  Archive Device = /dev/nst0
  AutomaticMount = yes;               # when device opened, read it
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  Maximum File Size = 4GB
  Changer Command = "/usr/libexec/bacula/mtx-changer %c %o %S %a %d"
  Changer Device = /dev/sg4
  AutoChanger = yes
  SpoolDirectory = /home/backup/spool;
  MaximumSpoolSize = 50G;
# CentOS 6 tapeinfo needs generic device
  Alert Command = "sh -c 'tapeinfo -f /dev/sg3 |grep TapeAlert|cat'"
## If you have smartctl, enable this, it has more info than tapeinfo
# Alert Command = "sh -c 'smartctl -H -l error %c'"
}

Scanning the log it looks like the storage daemon decreases its
spool size limit every time a backup job crashes. The last time
it used the full 50 GB was here:

04-Mai 23:36 bacula-sd JobId 2040: Writing spooled data to Volume.
Despooling 53,687,127,810 bytes ...

That server then crashed during the backup (cause undetermined
as yet, but probably hardware related). Bacula reported:

04-Mai 23:36 bacula-sd JobId 2040: User specified spool size reached.
04-Mai 23:36 bacula-sd JobId 2040: Writing spooled data to Volume.
Despooling 53,687,127,810 bytes ...
04-Mai 23:52 bacula-sd JobId 2040: Despooling elapsed time = 00:16:16,
Transferrate = 55.00 M Bytes/second
04-Mai 23:52 bacula-sd JobId 2040: Spooling data again ...
05-Mai 00:46 backup-dir JobId 2040: Fatal error: Network error with FD
during Backup: ERR=Die Verbindung wurde vom Kommunikationspartner
zurückgesetzt
05-Mai 00:46 backup-dir JobId 2040: Fatal error: No Job status returned
from FD.

(The text after ERR= is the German translation of "Connection reset
by peer.") Then it proceeded to the next client, but with the spool
limit reduced to 10 GB:

05-Mai 01:13 bacula-sd JobId 2042: User specified spool size reached.
05-Mai 01:13 bacula-sd JobId 2042: Writing spooled data to Volume.
Despooling 9,902,626,948 bytes ...

(Job 2041 was smaller than 10 GB.)
Similarly, four days later, a manual backup job failed because of
a network error:

09-Mai 10:20 bacula-sd JobId 2075: User specified spool size reached.
09-Mai 10:20 bacula-sd JobId 2075: Writing spooled data to Volume.
Despooling 9,902,627,262 bytes ...
09-Mai 10:26 bacula-sd JobId 2075: Despooling elapsed time = 00:05:14,
Transfer rate = 31.53 M Bytes/second
09-Mai 10:26 bacula-sd JobId 2075: Spooling data again ...
09-Mai 12:27 backup-dir JobId 2075: Fatal error: Network error with FD
during Backup: ERR=Die Verbindung wurde vom Kommunikationspartner
zurückgesetzt
09-Mai 12:27 backup-dir JobId 2075: Fatal error: No Job status returned
from FD.

Subsequent jobs ran with a further reduced spool limit of 2 GB:

09-Mai 22:07 bacula-sd JobId 2097: Spooling data ...
09-Mai 22:08 bacula-sd JobId 2097: Writing spooled data to Volume.
Despooling 1,937,978,333 bytes ...

Later that night, we had rescheduled the backup of the machine
that had crashed on May 4, and it crashed again:

10-Mai 03:55 bacula-sd JobId 2096: User specified spool size reached.
10-Mai 03:55 bacula-sd JobId 2096: Writing spooled data to Volume.
Despooling 1,937,978,308 bytes ...
10-Mai 03:56 bacula-sd JobId 2096: Despooling elapsed time = 00:01:02,
Transfer rate = 31.25 M Bytes/second
10-Mai 03:56 bacula-sd JobId 2096: Spooling data again ...
10-Mai 04:14 backup-dir JobId 2096: Fatal error: Network error with FD
during Backup: ERR=Die Wartezeit für die Verbindung ist abgelaufen
10-Mai 04:14 backup-dir JobId 2096: Fatal error: No Job status returned
from FD.

Since then, the spool size limit used by the SD sits at 1.45 GB:

10-Mai 22:09 bacula-sd JobId 2117: Writing spooled data to Volume.
Despooling 1,450,176,877 bytes ...

The spool directory contained just two files of size zero.
I have removed those, restarted the SD, and am now waiting
for the next scheduled backup.

Looks like a bug to me ...

Best regards,
Tilman


Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users