My heartbeat interval is set to 300 - you should probably increase yours.
I originally had a smaller interval and had some clients that would time
out - particularly if they were Windows clients.
Patti Clark
Linux System Administrator
R&D Systems Support Oak Ridge National Laboratory
On 9/3/15, 7:16 PM, "Jeffrey R. Lang" <JRLang AT uwyo DOT edu> wrote:
>First let me say thanks to Kern and all those that have helped make
>bacula a great tool.
>
>My current backup environment currently consists of a server, VTL and a
>tape library connected by a 10GiG network. Bacula currently at 5.2.13.
>I plan on upgrading once I've integrated the tape library and thing were
>working. A good starting point for an upgrade.
>
>My issue is when jobs are destined for the tape library I have enable
>job spooling, but these job always timeout after the first spooled block
>of data is written to tape. Here's an example:
>
>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record
>found.
>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup
>found in catalog. Doing FULL backup.
>03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570,
>Job=bighorn-home.2015-09-03_11.19.32_03
>03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write.
>03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot
>94, drive 0" command.
>03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94,
>drive 0", status is OK.
>03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously
>written, moving to end of data.
>03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume
>"000094L5" at file=1724.
>03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ...
>03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size
>reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200
>03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume.
>Despooling 322,122,610,512 bytes ...
>03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error
>sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103:
>ERR=Connection timed out
>03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network
>send error to SD. ERR=Connection timed out
>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD
>dropped.
>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13
>(19Jan13):
> Build OS: x86_64-unknown-linux-gnu redhat Enterprise
>release
> JobId: 12570
> Job: bighorn-home.2015-09-03_11.19.32_03
> Backup Level: Full (upgraded from Incremental)
> Client: "mmonsd4-fd" 5.2.13 (19Jan13)
>x86_64-unknown-linux-gnu,redhat,
> FileSet: "bighorn-home" 2015-06-12 08:21:10
> Pool: "ARCC" (From Job resource)
> Catalog: "MyCatalog" (From Client resource)
> Storage: "NEO4200" (From Pool resource)
> Scheduled time: 03-Sep-2015 11:19:31
> Start time: 03-Sep-2015 11:20:26
> End time: 03-Sep-2015 15:23:28
> Elapsed time: 4 hours 3 mins 2 secs
> Priority: 10
> FD Files Written: 1,805,322
> SD Files Written: 0
> FD Bytes Written: 321,716,097,371 (321.7 GB)
> SD Bytes Written: 0 (0 B)
> Rate: 22062.5 KB/s
> Software Compression: None
> VSS: no
> Encryption: no
> Accurate: yes
> Volume name(s): 000094L5
> Volume Session Id: 1
> Volume Session Time: 1441300753
> Last Volume Bytes: 1,817,725,514,752 (1.817 TB)
> Non-fatal FD errors: 2
> SD Errors: 0
> FD termination status: Error
> SD termination status: Error
> Termination: *** Backup Error ***
>
>If I turn off job spooling then the job will complete as expected.
>
>I have enable "heartbeats" on the client, storage daemon and director
>but that didn't help.
>
>My current client configuration is this:
>
>FileDaemon { # this is me
> Name = mmcnsd4-fd
> FDport = 9102 # where we listen for the director
> WorkingDirectory = /usr/local/bacula/working
> Pid Directory = /usr/local/bacula/working
> Maximum Concurrent Jobs = 20
> Maximum Network Buffer Size = 262144
> Heartbeat Interval = 60
>}
>
>My storage daemon configuration is:
>Storage { # definition of myself
> Name = bkupsvr2-sd
> SDPort = 9103 # Director's port
> WorkingDirectory = "/usr/local/bacula/working"
> Pid Directory = "/usr/local/bacula/working"
> Maximum Concurrent Jobs = 20
> Heartbeat Interval = 60
>}
>
>The only thing I can see is that with spooling turned off, data is
>constantly flowing over the network connection. With the spooling
>turned on there is a quiet period on the network connection.
>
>I've talked with my network engineer about this and he says there's
>nothing in the network that would cause the application to close the
>connection.
>
>So has any one seen this problem before?
>Any ideas on what to look at to figure this out?
>
>jeff
>
>
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|