You could try the patch in http://bugs.bacula.org/view.php?id=2083 (I recently
attached a fix for 7.4.4 to this bug).
>>>>> On Sun, 12 Feb 2017 12:09:21 +0100, Kern Sibbald said:
>
> Hello Dan,
>
> Well, then we are back to my theory that FreeBSD wants a
> particular length in some of the system calls.
>
> In general, Bacula uses lengths that work for both IPv4 and IPv6,
> which means that if you are using IPv4, the packet lengths in some
> calls are longer than the really need to be. This may or may not
> be true in your case, but is a path to look at.
>
> Best regards,
>
> Kern
>
> On 02/11/2017 11:08 PM, Dan Langille wrote:
>
> On Feb 11, 2017, at 2:11 AM, Kern Sibbald <kern AT sibbald DOT com> wrote:
>
> Hello Dan,
>
> Well, I am happy to know that it is not Bacula related
> :-)
>
> Well, it's not solved yet. This just in.
>
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374:
> Elapsed time=03:20:24, Transfer rate=29.63 M Bytes/second
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
> bsock.c:375 Socket has errors=1 on call to client:Hostname
> not found:9103
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
> fd_cmds.c:142 Read data not accepted
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
> bsock.c:375 Socket has errors=1 on call to client:Hostname
> not found:9103
>
> 9103? That must be an SD for which someone
> cannot figure out the hostname based on the IP address.
>
> Would it not be easier to list the IP address
> when the hostname cannot be found? That would give a
> starting point.
>
> Let's go to the SD in question: bacula-sd-01-sd
> which is at bacula-sd-01.int.unixathome.org (the
> Read Storage listed below). From there,
>
> I try to resolve the write SD (tape01 at tape01.int.unixathome.org),
> which is what this message must pertain to.
>
> $ dig +short tape01.int.unixathome.org
> 10.55.0.103
>
> $ dig +short -x 10.55.0.103
> tape01.int.unixathome.org.
>
> That looks OK. Perhaps I misunderstand which
> hostname it's trying to provide.
>
> Is there something else I should verify with
> respect to DNS in this instance?
>
> I looked at the code: The message comes from
> line 374 of src/lib/bsock.c
>
> The 'Hostname not found' comes from line 646
> of src/lib/address_conf.c
>
> The code invokes getnameinfo(3) and I looked up
> that info .http://www.unix.com/man-page/FreeBSD/3/getnameinfo/
>
> I'm happy to patch with debugging code for this,
> but I'm out of time this weekend.
>
> The full job output is here:
>
> 11-Feb 17:50 bacula-dir JobId 256374: Warning:
> FileSet MD5 digest not found.
> 11-Feb 17:50 bacula-dir JobId 256374: Copying using
> JobId=255999 Job=knew_jail_snapshots.2017-02-05_03.04.02_47
> 11-Feb 17:50 bacula-dir JobId 256374: Bootstrap records
> written to
> /usr/local/bacula/working/bacula-dir.restore.10.bsr
> 11-Feb 17:50 bacula-dir JobId 256374: Start Copying JobId
> 256374, Job=CopyToTape-Full-LTO4.2017-02-11_17.50.17_22
> 11-Feb 17:50 bacula-dir JobId 256374: Using Device
> "vDrive-6" to read.
> 11-Feb 17:50 bacula-sd-01-sd JobId 256374: Ready to read
> from volume "FullAuto-1377" on file device "vDrive-6"
> (/usr/local/bacula/volumes).
>
> lots of reading skipped.....
>
> 11-Feb 21:00 bacula-sd-01-sd JobId 256374: Ready to read
> from volume "FullAuto-1514" on file device "vDrive-6"
> (/usr/local/bacula/volumes).
> 11-Feb 21:00 bacula-sd-01-sd JobId 256374: Forward spacing
> Volume "FullAuto-1514" to file:block 0:214.
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
> bsock.c:453 Wrote 65540 bytes to client:Hostname not
> found:9103, but only 32768 accepted.
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
> read.c:284 Error sending to File daemon. ERR=Broken pipe
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Elapsed
> time=03:20:24, Transfer rate=29.63 M Bytes/second
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
> bsock.c:375 Socket has errors=1 on call to client:Hostname
> not found:9103
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
> fd_cmds.c:142 Read data not accepted
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
> bsock.c:375 Socket has errors=1 on call to client:Hostname
> not found:9103
> 11-Feb 21:11 bacula-dir JobId 256374: Error: Bacula
> bacula-dir 7.4.4 (28Sep16):
> Build OS: amd64-portbld-freebsd11.0 freebsd
> 11.0-RELEASE-p6
> Prev Backup JobId: 255999
> Prev Backup Job:
> knew_jail_snapshots.2017-02-05_03.04.02_47
> New Backup JobId: 256375
> Current JobId: 256374
> Current Job:
> CopyToTape-Full-LTO4.2017-02-11_17.50.17_22
> Backup Level: Full
> Client: crey-fd
> FileSet: "EmptyCopyToTape" 2011-02-20
> 20:53:31
> Read Pool: "FullFile" (From Command input)
> Read Storage: "bacula-sd-01-file" (From Pool
> resource)
> Write Pool: "FullsLTO4" (From Command input)
> Write Storage: "tape01" (From Command input)
> Catalog: "MyCatalog" (From Client resource)
> Start time: 11-Feb-2017 17:50:50
> End time: 11-Feb-2017 21:11:14
> Elapsed time: 3 hours 20 mins 24 secs
> Priority: 10
> SD Files Written: 404,305
> SD Bytes Written: 356,347,133,351 (356.3 GB)
> Rate: 29636.3 KB/s
> Volume name(s): 000013L4
> Volume Session Id: 143
> Volume Session Time: 1486562541
> Last Volume Bytes: 1,187,871,455,232 (1.187 TB)
> SD Errors: 3
> SD termination status: Error
> Termination: *** Copying Error ***
>
> --
>
> Dan Langille - BSDCan / PGCon
>
> dan AT langille DOT org
>
> Kern
>
> On 02/10/2017 09:46 PM, Dan Langille wrote:
>
> On Feb 10, 2017, at 12:55 PM, Dan
> Langille <dan AT langille DOT org>
> wrote:
>
> On Feb 10, 2017, at 12:20 PM,
> Kern Sibbald <kern AT sibbald DOT com>
> wrote:
>
> Hello,
>
> I suspect that this is a problem with
> the FreeBSD networking
> implementation. If I remember right
> on FreeBSD, when doing name lookups,
> if the packet size is not *exactly*
> what FreeBSD wants, it fails the
> call. On Linux and other machines
> (Solaris, Mac), as long as the packet
> size is equal or greater than what is
> needed the OS call succeeds. If I am
> not mistaken, Bacula allocates space
> for the larger of IPv4 and IPv6 (which
> is always IPv6), and so if you are
> using an IPv4 network, Bacula may send
> OS calls with a packet size larger
> than actually required.
>
> I just spoke with a FreeBSD
> developer. They are unaware of anything
> special in the FreeBSD ports tree for
> patching FreeBSD when it comes to doing name
> lookups. Specifically, gethostby*(),
> getipnodeby*() just work...
>
> If you can reproduce/encounter a
> situation which fails, we will look at it
> and fix it. In short, I do not think this
> is an issue with the FreeBSD networking
> implementation.
>
> I suspect it's a local DNS
> misconfiguration on one of my hosts. Which
> ones, I don't know yet. Your first post
> mentioned the SDs, so I checked them. They
> seem OK now. I will verify them again if I
> see them again.
>
> For testing purposes, I will use
> dig +short and verify that all of these FQDN
> resolve on the FD, the SD, the director, and
> from from a fourth host:
>
> The FQDN of the FD (even though
> it is not used in a Copy job)
>
> The FQDN of the read SD
>
> The FQDN of the write SD
>
> The FQDN of the Director
>
> I will also check that the PTR
> record for the A record also resolves back
> to the FQDN.
>
> If this is
> the case, I would consider it a
> FreeBSD bug. For me to fix it is a
> bit complicated, because I need to
> know exactly what call is failing and
> the values that FreeBSD wants. By the
> way, it is possible this is already
> fixed in the Enterprise version where
> FreeBSD is supported too. If that is
> the case, in my next round of
> backporting to start next week, it
> will get fixed.
>
> If this was the case, I'd expect apps on
> FreeBSD to be failing everywhere. They
> aren't. I've never patched anything for DNS
> issues either.
>
> I suspect it's more likely to be
> an issue on one of the Bacula nodes in
> question (SD or FD) where there is a local DNS
> issue.
>
> For documentation: I found that one of my
> three name servers was not correctly resolving PTR
> records. That is:
>
> $ dig +short -x 10.52.0.1 @10.55.0.1
>
> Whereas the other two nameservers:
>
> $ dig +short -x 10.52.0.1 @10.55.0.13
> bast.int.unixathome.org.
>
> $ dig +short -x
> 10.52.0.1 @10.55.0.73
> bast.int.unixathome.org.
>
> That issue since
> been fixed. It took a while to track down.
> That nameserver is running pfSense, using
> their GUI. The issue is right there on the
> page:
>
> Note:
> IN-ADDR.ARPA will be automaticaly included in
> config files when reverse zone option is
> checked.
>
> The checkbox is
> four controls farther down the page.
>
> The expected zone
> 0.55.10.in-addr.arpa was actually zone
> 0.55.10.in-addr.arpa.in-addr.arpa
>
> I'll keep an eye
> on the Bacula logs to see if this shows up
> again.
>
> Thank you.
>
> --
>
> Dan Langille - BSDCan / PGCon
>
> dan AT langille DOT org
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|