Bacula-users

[Bacula-users] doubt on tape change with concurrent jobs

2013-01-31 05:05:16
Subject: [Bacula-users] doubt on tape change with concurrent jobs
From: Tilman Schmidt <t.schmidt AT phoenixsoftware DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 31 Jan 2013 11:01:56 +0100
On my main backup server, running Bacula 5.2.12 on CentOS 6.3
and backing up 18 clients to an LTO-3 autochanger, I recently
activated concurrent jobs by setting Maximum Concurrent Jobs
to 10 in the Director and Storage resources. This seemed to
work quite well for a couple of days.

Last night the tape was full, and Bacula dutifully continued
with the next one. The jobs following the one with the tape
change, however, produced somewhat disquieting logs.

- The nightly run started at 22:05 on tape Daily_2.

- The first six backups (in order of completion) went on that tape.

- The seventh job (as it happened, the one backing up the backup
  server itself, which by coincidence got promoted from Incremental
  to Full because of a Fileset modification) reported the tape change:

30-Jan 20:15 bacula-sd JobId 6142: Committing spooled data to Volume
"Daily_2". Despooling 4,855,877,581 bytes ...
30-Jan 20:18 bacula-sd JobId 6142: End of Volume "Daily_2" at 319:3229
on device "LTO-3" (/dev/nst0). Write of 64512 bytes got -1.
30-Jan 20:18 bacula-sd JobId 6142: Re-read of last block succeeded.
30-Jan 20:18 bacula-sd JobId 6142: End of medium on Volume "Daily_2"
Bytes=799,806,099,456 Blocks=12,397,787 at 30-Jan-2013 20:18.
30-Jan 20:18 bacula-sd JobId 6142: 3307 Issuing autochanger "unload slot
3, drive 0" command.
30-Jan 20:19 bacula-sd JobId 6142: 3304 Issuing autochanger "load slot
2, drive 0" command.
30-Jan 20:20 bacula-sd JobId 6142: 3305 Autochanger "load slot 2, drive
0", status is OK.
30-Jan 20:20 bacula-sd JobId 6142: Recycled volume "Daily_1" on device
"LTO-3" (/dev/nst0), all previous data lost.
30-Jan 20:20 bacula-sd JobId 6142: New volume "Daily_1" mounted on
device "LTO-3" (/dev/nst0) at 30-Jan-2013 20:20.
30-Jan 20:21 bacula-sd JobId 6142: Despooling elapsed time = 00:02:51,
Transfer rate = 28.39 M Bytes/second

- The next four jobs (always in order of completion) did however
  still report writing to tape Daily_2, the last one apparently
  even starting to use it well after it had been unloaded:

30-Jan 20:15 bacula-sd JobId 6156: Committing spooled data to Volume
"Daily_2". Despooling 1,602,040,011 bytes ...
30-Jan 20:22 bacula-sd JobId 6156: Despooling elapsed time = 00:00:30,
Transfer rate = 53.40 M Bytes/second

30-Jan 20:16 bacula-sd JobId 6147: Committing spooled data to Volume
"Daily_2". Despooling 2,558,276,008 bytes ...
30-Jan 20:24 bacula-sd JobId 6147: Despooling elapsed time = 00:00:43,
Transfer rate = 59.49 M Bytes/second

30-Jan 20:18 bacula-sd JobId 6154: Committing spooled data to Volume
"Daily_2". Despooling 1,441,633,135 bytes ...
30-Jan 20:25 bacula-sd JobId 6154: Despooling elapsed time = 00:00:24,
Transfer rate = 60.06 M Bytes/second

30-Jan 20:20 bacula-sd JobId 6150: Committing spooled data to Volume
"Daily_2". Despooling 5,609,655,403 bytes ...
30-Jan 20:27 bacula-sd JobId 6150: Despooling elapsed time = 00:01:41,
Transfer rate = 55.54 M Bytes/second

- The job after that went to tape Daily_1 as expected:

30-Jan 20:22 bacula-sd JobId 6155: Committing spooled data to Volume
"Daily_1". Despooling 4,379,708,893 bytes ...
30-Jan 20:33 bacula-sd JobId 6155: Despooling elapsed time = 00:03:09,
Transfer rate = 23.17 M Bytes/second

- Then, most disquieting, the next one reported that tape as
  previously written, and a file count mismatch:

30-Jan 20:28 bacula-sd JobId 6152: Volume "Daily_1" previously written,
moving to end of data.
30-Jan 20:30 bacula-sd JobId 6152: Warning: For Volume "Daily_1":
The number of files mismatch! Volume=4 Catalog=3
Correcting Catalog
30-Jan 20:30 bacula-sd JobId 6152: Spooling data ...
30-Jan 20:32 tourserv-fd JobId 6152:      /var/lib/nfs/rpc_pipefs is a
different filesystem. Will not descend from /var into it.
30-Jan 20:33 bacula-sd JobId 6152: Committing spooled data to Volume
"Daily_1". Despooling 5,469,441,198 bytes ...
30-Jan 20:36 bacula-sd JobId 6152: Despooling elapsed time = 00:01:54,
Transfer rate = 47.97 M Bytes/second

- The remaining jobs completed uneventfully, possibly due to the
  fact that they were considerable delayed by long-running
  ClientRunBeforeJob-s, so the spooling area was drained
  completely before they started sending data.

What has happened here? Should I worry?

aTdHvAaNnKcSe
Tilman

-- 
Tilman Schmidt
Phoenix Software GmbH
Bonn, Germany

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] doubt on tape change with concurrent jobs, Tilman Schmidt <=