Bacula-users

Re: [Bacula-users] Timout problem with spooled jobs

2015-09-04 07:13:57
Subject: Re: [Bacula-users] Timout problem with spooled jobs
From: "Clark, Patti" <clarkpa AT ornl DOT gov>
To: "Jeffrey R. Lang" <JRLang AT uwyo DOT edu>, bacula Users <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 4 Sep 2015 11:08:23 +0000
My heartbeat interval is set to 300 - you should probably increase yours.
I originally had a smaller interval and had some clients that would time
out - particularly if they were Windows clients.

Patti Clark
Linux System Administrator
R&D Systems Support Oak Ridge National Laboratory



On 9/3/15, 7:16 PM, "Jeffrey R. Lang" <JRLang AT uwyo DOT edu> wrote:

>First let me say thanks to Kern and all those that have helped make
>bacula a great tool.
>
>My current backup environment currently consists of a server, VTL and a
>tape library connected by a 10GiG network.  Bacula currently at 5.2.13.
>I plan on upgrading once I've integrated the tape library and thing were
>working.  A good starting point for an upgrade.
>
>My issue is when jobs are destined for the tape library I have enable
>job spooling, but these job always timeout after the first spooled block
>of data is written to tape.  Here's an example:
>
>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record
>found.
>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup
>found in catalog. Doing FULL backup.
>03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570,
>Job=bighorn-home.2015-09-03_11.19.32_03
>03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write.
>03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot
>94, drive 0" command.
>03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94,
>drive 0", status is OK.
>03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously
>written, moving to end of data.
>03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume
>"000094L5" at file=1724.
>03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ...
>03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size
>reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200
>03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume.
>Despooling 322,122,610,512 bytes ...
>03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error
>sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103:
>ERR=Connection timed out
>03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network
>send error to SD. ERR=Connection timed out
>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD
>dropped.
>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13
>(19Jan13):
>  Build OS:               x86_64-unknown-linux-gnu redhat Enterprise
>release
>  JobId:                  12570
>  Job:                    bighorn-home.2015-09-03_11.19.32_03
>  Backup Level:           Full (upgraded from Incremental)
>  Client:                 "mmonsd4-fd" 5.2.13 (19Jan13)
>x86_64-unknown-linux-gnu,redhat,
>  FileSet:                "bighorn-home" 2015-06-12 08:21:10
>  Pool:                   "ARCC" (From Job resource)
>  Catalog:                "MyCatalog" (From Client resource)
>  Storage:                "NEO4200" (From Pool resource)
>  Scheduled time:         03-Sep-2015 11:19:31
>  Start time:             03-Sep-2015 11:20:26
>  End time:               03-Sep-2015 15:23:28
>  Elapsed time:           4 hours 3 mins 2 secs
>  Priority:               10
>  FD Files Written:       1,805,322
>  SD Files Written:       0
>  FD Bytes Written:       321,716,097,371 (321.7 GB)
>  SD Bytes Written:       0 (0 B)
>  Rate:                   22062.5 KB/s
>  Software Compression:   None
>  VSS:                    no
>  Encryption:             no
>  Accurate:               yes
>  Volume name(s):         000094L5
>  Volume Session Id:      1
>  Volume Session Time:    1441300753
>  Last Volume Bytes:      1,817,725,514,752 (1.817 TB)
>  Non-fatal FD errors:    2
>  SD Errors:              0
>  FD termination status:  Error
>  SD termination status:  Error
>  Termination:            *** Backup Error ***
>
>If I turn off job spooling then the job will complete as expected.
>
>I have enable "heartbeats" on the client, storage daemon and director
>but that didn't help.
>
>My current client configuration is this:
>
>FileDaemon {                          # this is me
>  Name = mmcnsd4-fd
>  FDport = 9102                  # where we listen for the director
>  WorkingDirectory = /usr/local/bacula/working
>  Pid Directory = /usr/local/bacula/working
>  Maximum Concurrent Jobs = 20
>  Maximum Network Buffer Size = 262144
>  Heartbeat Interval = 60
>}
>
>My storage daemon configuration is:
>Storage {                             # definition of myself
>  Name = bkupsvr2-sd
>  SDPort = 9103                  # Director's port
>  WorkingDirectory = "/usr/local/bacula/working"
>  Pid Directory = "/usr/local/bacula/working"
>  Maximum Concurrent Jobs = 20
>  Heartbeat Interval = 60
>}
>
>The only thing I can see is that with spooling turned off, data is
>constantly flowing over the network connection.  With the spooling
>turned on there is a quiet period on the network connection.
>
>I've talked with my network engineer about this and he says there's
>nothing in the network that would cause the application to close the
>connection.
>
>So has any one seen this problem before?
>Any ideas on what to look at to figure this out?
>
>jeff
>
>


------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users