Bacula-users

Re: [Bacula-users] Timout problem with spooled jobs

2015-09-04 07:53:40
Subject: Re: [Bacula-users] Timout problem with spooled jobs
From: "Clark, Patti" <clarkpa AT ornl DOT gov>
To: "Jeffrey R. Lang" <JRLang AT uwyo DOT edu>, bacula Users <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 4 Sep 2015 11:50:56 +0000
One more suggestion, reducing the spool size has also helped.  I have a
mix of clients where initially everyone was using a 500GB spool size.  My
10Gb connected clients were fine, but my 1 Gb clients would timeout when
things were busy - I dropped the spool size to 50GB for the 1 Gb clients
and rarely have a timeout.

Patti Clark
Linux System Administrator
R&D Systems Support Oak Ridge National Laboratory



On 9/4/15, 7:08 AM, "Clark, Patti" <clarkpa AT ornl DOT gov> wrote:

>My heartbeat interval is set to 300 - you should probably increase yours.
>I originally had a smaller interval and had some clients that would time
>out - particularly if they were Windows clients.
>
>Patti Clark
>Linux System Administrator
>R&D Systems Support Oak Ridge National Laboratory
>
>
>
>On 9/3/15, 7:16 PM, "Jeffrey R. Lang" <JRLang AT uwyo DOT edu> wrote:
>
>>First let me say thanks to Kern and all those that have helped make
>>bacula a great tool.
>>
>>My current backup environment currently consists of a server, VTL and a
>>tape library connected by a 10GiG network.  Bacula currently at 5.2.13.
>>I plan on upgrading once I've integrated the tape library and thing were
>>working.  A good starting point for an upgrade.
>>
>>My issue is when jobs are destined for the tape library I have enable
>>job spooling, but these job always timeout after the first spooled block
>>of data is written to tape.  Here's an example:
>>
>>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior Full backup Job record
>>found.
>>03-Sep 11:19 bkupsvr2-dir JobId 12570: No prior or suitable Full backup
>>found in catalog. Doing FULL backup.
>>03-Sep 11:19 bkupsvr2-dir JobId 12570: Start Backup JobId 12570,
>>Job=bighorn-home.2015-09-03_11.19.32_03
>>03-Sep 11:20 bkupsvr2-dir JobId 12570: Using Device "LTO5-0" to write.
>>03-Sep 11:21 bkupsvr2-sd JobId 12570: 3304 Issuing autochanger "load slot
>>94, drive 0" command.
>>03-Sep 11:22 bkupsvr2-sd JobId 12570: 3305 Autochanger "load slot 94,
>>drive 0", status is OK.
>>03-Sep 11:22 bkupsvr2-sd JobId 12570: Volume "000094L5" previously
>>written, moving to end of data.
>>03-Sep 11:23 bkupsvr2-sd JobId 12570: Ready to append to end of Volume
>>"000094L5" at file=1724.
>>03-Sep 11:23 bkupsvr2-sd JobId 12570: Spooling data ...
>>03-Sep 15:07 bkupsvr2-sd JobId 12570: User specified Device spool size
>>reached: DevSpoolSize=322,122,610,512 MaxDevSpoolSize=322,122,547,200
>>03-Sep 15:07 bkupsvr2-sd JobId 12570: Writing spooled data to Volume.
>>Despooling 322,122,610,512 bytes ...
>>03-Sep 15:23 mmcnsd4-fd JobId 12570: Error: bsock.c:429 Write error
>>sending 253977 bytes to Storage daemon:bkupsvr2.gg.uwyo.edu:9103:
>>ERR=Connection timed out
>>03-Sep 15:23 mmcnsd4-fd JobId 12570: Fatal error: backup.c:1200 Network
>>send error to SD. ERR=Connection timed out
>>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Director's comm line to SD
>>dropped.
>>03-Sep 15:23 bkupsvr2-dir JobId 12570: Error: Bacula bkupsvr2-dir 5.2.13
>>(19Jan13):
>>  Build OS:               x86_64-unknown-linux-gnu redhat Enterprise
>>release
>>  JobId:                  12570
>>  Job:                    bighorn-home.2015-09-03_11.19.32_03
>>  Backup Level:           Full (upgraded from Incremental)
>>  Client:                 "mmonsd4-fd" 5.2.13 (19Jan13)
>>x86_64-unknown-linux-gnu,redhat,
>>  FileSet:                "bighorn-home" 2015-06-12 08:21:10
>>  Pool:                   "ARCC" (From Job resource)
>>  Catalog:                "MyCatalog" (From Client resource)
>>  Storage:                "NEO4200" (From Pool resource)
>>  Scheduled time:         03-Sep-2015 11:19:31
>>  Start time:             03-Sep-2015 11:20:26
>>  End time:               03-Sep-2015 15:23:28
>>  Elapsed time:           4 hours 3 mins 2 secs
>>  Priority:               10
>>  FD Files Written:       1,805,322
>>  SD Files Written:       0
>>  FD Bytes Written:       321,716,097,371 (321.7 GB)
>>  SD Bytes Written:       0 (0 B)
>>  Rate:                   22062.5 KB/s
>>  Software Compression:   None
>>  VSS:                    no
>>  Encryption:             no
>>  Accurate:               yes
>>  Volume name(s):         000094L5
>>  Volume Session Id:      1
>>  Volume Session Time:    1441300753
>>  Last Volume Bytes:      1,817,725,514,752 (1.817 TB)
>>  Non-fatal FD errors:    2
>>  SD Errors:              0
>>  FD termination status:  Error
>>  SD termination status:  Error
>>  Termination:            *** Backup Error ***
>>
>>If I turn off job spooling then the job will complete as expected.
>>
>>I have enable "heartbeats" on the client, storage daemon and director
>>but that didn't help.
>>
>>My current client configuration is this:
>>
>>FileDaemon {                          # this is me
>>  Name = mmcnsd4-fd
>>  FDport = 9102                  # where we listen for the director
>>  WorkingDirectory = /usr/local/bacula/working
>>  Pid Directory = /usr/local/bacula/working
>>  Maximum Concurrent Jobs = 20
>>  Maximum Network Buffer Size = 262144
>>  Heartbeat Interval = 60
>>}
>>
>>My storage daemon configuration is:
>>Storage {                             # definition of myself
>>  Name = bkupsvr2-sd
>>  SDPort = 9103                  # Director's port
>>  WorkingDirectory = "/usr/local/bacula/working"
>>  Pid Directory = "/usr/local/bacula/working"
>>  Maximum Concurrent Jobs = 20
>>  Heartbeat Interval = 60
>>}
>>
>>The only thing I can see is that with spooling turned off, data is
>>constantly flowing over the network connection.  With the spooling
>>turned on there is a quiet period on the network connection.
>>
>>I've talked with my network engineer about this and he says there's
>>nothing in the network that would cause the application to close the
>>connection.
>>
>>So has any one seen this problem before?
>>Any ideas on what to look at to figure this out?
>>
>>jeff
>>
>>
>
>
>--------------------------------------------------------------------------
>----
>_______________________________________________
>Bacula-users mailing list
>Bacula-users AT lists.sourceforge DOT net
>https://lists.sourceforge.net/lists/listinfo/bacula-users
>


------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>