One more suggestion, reducing the spool size has also helped. I have a
mix of clients where initially everyone was using a 500GB spool size. My
10Gb connected clients were fine, but my 1 Gb clients would timeout when
things were busy - I dropped the spool size to 50GB for the 1 Gb clients
and rarely have a timeout.
Patti Clark
Linux System Administrator
R&D Systems Support Oak Ridge National Laboratory
On 9/4/15, 7:08 AM, "Clark, Patti" <clarkpa AT ornl DOT gov> wrote:
>My heartbeat interval is set to 300 - you should probably increase yours.
>I originally had a smaller interval and had some clients that would time
>out - particularly if they were Windows clients.
>
>Patti Clark
>Linux System Administrator
>R&D Systems Support Oak Ridge National Laboratory
>
>
>
>On 9/3/15, 7:16 PM, "Jeffrey R. Lang" <JRLang AT uwyo DOT edu> wrote:
>
>>First let me say thanks to Kern and all those that have helped make
>>bacula a great tool.
>>
>>My current backup environment currently consists of a server, VTL and a
>>tape library connected by a 10GiG network. Bacula currently at 5.2.13.
>>I plan on upgrading once I've integrated the tape library and thing were
>>working. A good starting point for an upgrade.
>>
>>My issue is when jobs are destined for the tape library I have enable
>>job spooling, but these job always timeout after the first spooled block
>>of data is written to tape. Here's an example:
>>
>>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record
>>found.
>>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup
>>found in catalog. Doing FULL backup.
>>03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570,
>>Job=bighorn-home.2015-09-03_11.19.32_03
>>03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write.
>>03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot
>>94, drive 0" command.
>>03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94,
>>drive 0", status is OK.
>>03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously
>>written, moving to end of data.
>>03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume
>>"000094L5" at file=1724.
>>03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ...
>>03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size
>>reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200
>>03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume.
>>Despooling 322,122,610,512 bytes ...
>>03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error
>>sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103:
>>ERR=Connection timed out
>>03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network
>>send error to SD. ERR=Connection timed out
>>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD
>>dropped.
>>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13
>>(19Jan13):
>> Build OS: x86_64-unknown-linux-gnu redhat Enterprise
>>release
>> JobId: 12570
>> Job: bighorn-home.2015-09-03_11.19.32_03
>> Backup Level: Full (upgraded from Incremental)
>> Client: "mmonsd4-fd" 5.2.13 (19Jan13)
>>x86_64-unknown-linux-gnu,redhat,
>> FileSet: "bighorn-home" 2015-06-12 08:21:10
>> Pool: "ARCC" (From Job resource)
>> Catalog: "MyCatalog" (From Client resource)
>> Storage: "NEO4200" (From Pool resource)
>> Scheduled time: 03-Sep-2015 11:19:31
>> Start time: 03-Sep-2015 11:20:26
>> End time: 03-Sep-2015 15:23:28
>> Elapsed time: 4 hours 3 mins 2 secs
>> Priority: 10
>> FD Files Written: 1,805,322
>> SD Files Written: 0
>> FD Bytes Written: 321,716,097,371 (321.7 GB)
>> SD Bytes Written: 0 (0 B)
>> Rate: 22062.5 KB/s
>> Software Compression: None
>> VSS: no
>> Encryption: no
>> Accurate: yes
>> Volume name(s): 000094L5
>> Volume Session Id: 1
>> Volume Session Time: 1441300753
>> Last Volume Bytes: 1,817,725,514,752 (1.817 TB)
>> Non-fatal FD errors: 2
>> SD Errors: 0
>> FD termination status: Error
>> SD termination status: Error
>> Termination: *** Backup Error ***
>>
>>If I turn off job spooling then the job will complete as expected.
>>
>>I have enable "heartbeats" on the client, storage daemon and director
>>but that didn't help.
>>
>>My current client configuration is this:
>>
>>FileDaemon { # this is me
>> Name = mmcnsd4-fd
>> FDport = 9102 # where we listen for the director
>> WorkingDirectory = /usr/local/bacula/working
>> Pid Directory = /usr/local/bacula/working
>> Maximum Concurrent Jobs = 20
>> Maximum Network Buffer Size = 262144
>> Heartbeat Interval = 60
>>}
>>
>>My storage daemon configuration is:
>>Storage { # definition of myself
>> Name = bkupsvr2-sd
>> SDPort = 9103 # Director's port
>> WorkingDirectory = "/usr/local/bacula/working"
>> Pid Directory = "/usr/local/bacula/working"
>> Maximum Concurrent Jobs = 20
>> Heartbeat Interval = 60
>>}
>>
>>The only thing I can see is that with spooling turned off, data is
>>constantly flowing over the network connection. With the spooling
>>turned on there is a quiet period on the network connection.
>>
>>I've talked with my network engineer about this and he says there's
>>nothing in the network that would cause the application to close the
>>connection.
>>
>>So has any one seen this problem before?
>>Any ideas on what to look at to figure this out?
>>
>>jeff
>>
>>
>
>
>--------------------------------------------------------------------------
>----
>_______________________________________________
>Bacula-users mailing list
>Bacula-users AT lists.sourceforge DOT net
>https://lists.sourceforge.net/lists/listinfo/bacula-users
>
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|