Bacula-users

Re: [Bacula-users] Hostname not found

2017-02-13 09:16:55
Subject: Re: [Bacula-users] Hostname not found
From: Kern Sibbald <kern AT sibbald DOT com>
To: Martin Simmons <martin AT lispworks DOT com>
Date: Mon, 13 Feb 2017 15:15:35 +0100
Hell Martin,

Thanks for bringing this up.  I had forgotten about this patch. It was 
probably that patch that clued me in to the idea that FreeBSD 
(incorrectly in my opinion) requires an equal match on the size, while 
no other platform I know does.

I have not applied your patch, because the current code works as it 
should (except possibly on FreeBSD), and I am not 100% sure what 
problems we could have by changing it because I am not sure of the code 
paths that can be take.  For me a more acceptable solution would be to 
make the change only for FreeBSD.

This is particularly true because the Linux man page reads:

        The  sa  argument  is a pointer to a generic socket address 
structure
        (of type sockaddr_in or sockaddr_in6) of size salen  that holds  the
        input  IP  address  and port number.

Note, the FreeBSD man page is similar but not quite as specific about 
being a generic interface.

The first two arguments are supposed to be a generic socket address 
structure and its length.  They specifically claim that the arguments 
are designed to be: "in a protocol-independent manner".  The FreeBSD 
implementer must have overlooked this point as here he forces us to pass 
the length in a protocol dependent manner.

That said, it would probably be easier to modify Bacula than got get 
FreeBSD modified.

Thus my reluctance to change the code on Linux.

Best regards,
Kern


On 02/13/2017 12:23 PM, Martin Simmons wrote:
> You could try the patch in http://bugs.bacula.org/view.php?id=2083 (I recently
> attached a fix for 7.4.4 to this bug).
>
>
>>>>>> On Sun, 12 Feb 2017 12:09:21 +0100, Kern Sibbald said:
>> Hello Dan,
>>
>>        Well, then we are back to my theory that FreeBSD wants a
>>        particular length in some of the system calls.
>>
>>        In general, Bacula uses lengths that work for both IPv4 and IPv6,
>>        which means that if you are using IPv4, the packet lengths in some
>>        calls are longer than the really need to be.  This may or may not
>>        be true in your case, but is a path to look at.
>>
>>        Best regards,
>>
>>        Kern
>>
>>        On 02/11/2017 11:08 PM, Dan Langille wrote:
>>
>> On Feb 11, 2017, at 2:11 AM, Kern Sibbald <kern AT sibbald DOT com> wrote:
>>
>> Hello Dan,
>>
>>                  Well, I am happy to know that it is not Bacula related
>>                  :-)
>>
>> Well, it's not solved yet.  This just in.
>>
>> 11-Feb 21:11 bacula-sd-01-sd JobId 256374:
>>              Elapsed time=03:20:24, Transfer rate=29.63 M Bytes/second
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>>              bsock.c:375 Socket has errors=1 on call to client:Hostname
>>              not found:9103
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
>>              fd_cmds.c:142 Read data not accepted
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>>              bsock.c:375 Socket has errors=1 on call to client:Hostname
>>              not found:9103
>>
>> 9103?  That must be an SD for which someone
>>              cannot figure out the hostname based on the IP address.
>>
>> Would it not be easier to list the IP address
>>              when the hostname cannot be found? That would give a
>>              starting point.
>>
>> Let's go to the SD in question: bacula-sd-01-sd
>>              which is at bacula-sd-01.int.unixathome.org (the
>>              Read Storage listed below). From there,
>>
>> I try to resolve the write SD (tape01 at tape01.int.unixathome.org),
>>              which is what this message must pertain to.
>>
>> $ dig +short tape01.int.unixathome.org
>>              10.55.0.103
>>
>> $ dig +short -x 10.55.0.103
>>              tape01.int.unixathome.org.
>>
>> That looks OK.  Perhaps I misunderstand which
>>              hostname it's trying to provide.
>>
>> Is there something else I should verify with
>>                respect to DNS in this instance?
>>
>> I looked at the code: The message comes from
>>              line 374 of src/lib/bsock.c
>>
>> The 'Hostname not found' comes from line 646
>>              of src/lib/address_conf.c
>>
>> The code invokes getnameinfo(3) and I looked up
>>              that info .http://www.unix.com/man-page/FreeBSD/3/getnameinfo/
>>
>> I'm happy to patch with debugging code for this,
>>              but I'm out of time this weekend.
>>
>> The full job output is here:
>>
>> 11-Feb 17:50 bacula-dir JobId 256374: Warning:
>>              FileSet MD5 digest not found.
>>              11-Feb 17:50 bacula-dir JobId 256374: Copying using
>>              JobId=255999 Job=knew_jail_snapshots.2017-02-05_03.04.02_47
>>              11-Feb 17:50 bacula-dir JobId 256374: Bootstrap records
>>              written to
>>              /usr/local/bacula/working/bacula-dir.restore.10.bsr
>>              11-Feb 17:50 bacula-dir JobId 256374: Start Copying JobId
>>              256374, Job=CopyToTape-Full-LTO4.2017-02-11_17.50.17_22
>>              11-Feb 17:50 bacula-dir JobId 256374: Using Device
>>              "vDrive-6" to read.
>>              11-Feb 17:50 bacula-sd-01-sd JobId 256374: Ready to read
>>              from volume "FullAuto-1377" on file device "vDrive-6"
>>              (/usr/local/bacula/volumes).
>>
>>              lots of reading skipped.....
>>
>>              11-Feb 21:00 bacula-sd-01-sd JobId 256374: Ready to read
>>              from volume "FullAuto-1514" on file device "vDrive-6"
>>              (/usr/local/bacula/volumes).
>>              11-Feb 21:00 bacula-sd-01-sd JobId 256374: Forward spacing
>>              Volume "FullAuto-1514" to file:block 0:214.
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>>              bsock.c:453 Wrote 65540 bytes to client:Hostname not
>>              found:9103, but only 32768 accepted.
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
>>              read.c:284 Error sending to File daemon. ERR=Broken pipe
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Elapsed
>>              time=03:20:24, Transfer rate=29.63 M Bytes/second
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>>              bsock.c:375 Socket has errors=1 on call to client:Hostname
>>              not found:9103
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
>>              fd_cmds.c:142 Read data not accepted
>>              11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>>              bsock.c:375 Socket has errors=1 on call to client:Hostname
>>              not found:9103
>>              11-Feb 21:11 bacula-dir JobId 256374: Error: Bacula
>>              bacula-dir 7.4.4 (28Sep16):
>>               Build OS:               amd64-portbld-freebsd11.0 freebsd
>>              11.0-RELEASE-p6
>>               Prev Backup JobId:      255999
>>               Prev Backup Job:
>>                     knew_jail_snapshots.2017-02-05_03.04.02_47
>>               New Backup JobId:       256375
>>               Current JobId:          256374
>>               Current Job:
>>                         CopyToTape-Full-LTO4.2017-02-11_17.50.17_22
>>               Backup Level:           Full
>>               Client:                 crey-fd
>>               FileSet:                "EmptyCopyToTape" 2011-02-20
>>              20:53:31
>>               Read Pool:              "FullFile" (From Command input)
>>               Read Storage:           "bacula-sd-01-file" (From Pool
>>              resource)
>>               Write Pool:             "FullsLTO4" (From Command input)
>>               Write Storage:          "tape01" (From Command input)
>>               Catalog:                "MyCatalog" (From Client resource)
>>               Start time:             11-Feb-2017 17:50:50
>>               End time:               11-Feb-2017 21:11:14
>>               Elapsed time:           3 hours 20 mins 24 secs
>>               Priority:               10
>>               SD Files Written:       404,305
>>               SD Bytes Written:       356,347,133,351 (356.3 GB)
>>               Rate:                   29636.3 KB/s
>>               Volume name(s):         000013L4
>>               Volume Session Id:      143
>>               Volume Session Time:    1486562541
>>               Last Volume Bytes:      1,187,871,455,232 (1.187 TB)
>>               SD Errors:              3
>>               SD termination status:  Error
>>               Termination:            *** Copying Error ***
>>
>> --
>>
>> Dan Langille - BSDCan / PGCon
>>
>> dan AT langille DOT org
>>
>>                  Kern
>>
>>                  On 02/10/2017 09:46 PM, Dan Langille wrote:
>>
>> On Feb 10, 2017, at 12:55 PM, Dan
>>                        Langille <dan AT langille DOT org>
>>                        wrote:
>>
>> On Feb 10, 2017, at 12:20 PM,
>>                                Kern Sibbald <kern AT sibbald DOT com>
>>                                wrote:
>>
>> Hello,
>>
>>                                    I suspect that this is a problem with
>>                                    the FreeBSD networking
>>                                    implementation.  If I remember right
>>                                    on FreeBSD, when doing name lookups,
>>                                    if the packet size is not *exactly*
>>                                    what FreeBSD wants, it fails the
>>                                    call.  On Linux and other machines
>>                                    (Solaris, Mac), as long as the packet
>>                                    size is equal or greater than what is
>>                                    needed the OS call succeeds.  If I am
>>                                    not mistaken, Bacula allocates space
>>                                    for the larger of IPv4 and IPv6 (which
>>                                    is always IPv6), and so if you are
>>                                    using an IPv4 network, Bacula may send
>>                                    OS calls with a packet size larger
>>                                    than actually required.
>>
>> I just spoke with a FreeBSD
>>                              developer.  They are unaware of anything
>>                              special in the FreeBSD ports tree for
>>                              patching FreeBSD when it comes to doing name
>>                              lookups.  Specifically, gethostby*(),
>>                              getipnodeby*() just work...
>>
>> If you can reproduce/encounter a
>>                              situation which fails, we will look at it
>>                              and fix it.  In short, I do not think this
>>                              is an issue with the FreeBSD networking
>>                              implementation.
>>
>> I suspect it's a local DNS
>>                              misconfiguration on one of my hosts.  Which
>>                              ones, I don't know yet.  Your first post
>>                              mentioned the SDs, so I checked them. They
>>                              seem OK now.  I will verify them again if I
>>                              see them again.
>>
>> For testing purposes, I will use
>>                              dig +short and verify that all of these FQDN
>>                              resolve on the FD, the SD, the director, and
>>                              from from a fourth host:
>>
>> The FQDN of the FD (even though
>>                              it is not used in a Copy job)
>>
>> The FQDN of the read SD
>>
>> The FQDN of the write SD
>>
>> The FQDN of the Director
>>
>> I will also check that the PTR
>>                              record for the A record also resolves back
>>                              to the FQDN.
>>
>> If this is
>>                                    the case, I would consider it a
>>                                    FreeBSD bug.  For me to fix it is a
>>                                    bit complicated, because I need to
>>                                    know exactly what call is failing and
>>                                    the values that FreeBSD wants.  By the
>>                                    way, it is possible this is already
>>                                    fixed in the Enterprise version where
>>                                    FreeBSD is supported too.  If that is
>>                                    the case, in my next round of
>>                                    backporting to start next week, it
>>                                    will get fixed.
>>
>>                            If this was the case, I'd expect apps on
>>                            FreeBSD to be failing everywhere.  They
>>                            aren't.  I've never patched anything for DNS
>>                            issues either.
>>
>> I suspect it's more likely to be
>>                            an issue on one of the Bacula nodes in
>>                            question (SD or FD) where there is a local DNS
>>                            issue.
>>
>> For documentation: I found that one of my
>>                    three name servers was not correctly resolving PTR
>>                    records.  That is:
>>
>>   $ dig +short -x 10.52.0.1 @10.55.0.1
>>
>> Whereas the other two nameservers:
>>
>> $ dig +short -x 10.52.0.1 @10.55.0.13
>>                      bast.int.unixathome.org.
>>
>> $ dig +short -x
>>                            10.52.0.1 @10.55.0.73
>>                            bast.int.unixathome.org.
>>
>> That issue since
>>                            been fixed.  It took a while to track down.
>>                             That nameserver is running pfSense, using
>>                            their GUI.  The issue is right there on the
>>                            page:
>>
>> Note:
>>                            IN-ADDR.ARPA will be automaticaly included in
>>                            config files when reverse zone option is
>>                            checked.
>>
>> The checkbox is
>>                            four controls farther down the page.
>>
>> The expected zone
>>                            0.55.10.in-addr.arpa was actually zone
>>                            0.55.10.in-addr.arpa.in-addr.arpa
>>
>> I'll keep an eye
>>                            on the Bacula logs to see if this shows up
>>                            again.
>>
>> Thank you.
>>
>> --
>>
>> Dan Langille - BSDCan / PGCon
>>
>> dan AT langille DOT org
>>
>>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>