Bacula-users

Re: [Bacula-users] Hostname not found

2017-02-13 06:46:18
Subject: Re: [Bacula-users] Hostname not found
From: Martin Simmons <martin AT lispworks DOT com>
To: Kern Sibbald <kern AT sibbald DOT com>
Date: Mon, 13 Feb 2017 11:23:24 GMT
You could try the patch in http://bugs.bacula.org/view.php?id=2083 (I recently
attached a fix for 7.4.4 to this bug).


>>>>> On Sun, 12 Feb 2017 12:09:21 +0100, Kern Sibbald said:
> 
> Hello Dan,
> 
>       Well, then we are back to my theory that FreeBSD wants a
>       particular length in some of the system calls.
> 
>       In general, Bacula uses lengths that work for both IPv4 and IPv6,
>       which means that if you are using IPv4, the packet lengths in some
>       calls are longer than the really need to be.  This may or may not
>       be true in your case, but is a path to look at.
> 
>       Best regards,
> 
>       Kern
> 
>       On 02/11/2017 11:08 PM, Dan Langille wrote:
> 
> On Feb 11, 2017, at 2:11 AM, Kern Sibbald <kern AT sibbald DOT com> wrote:
> 
> Hello Dan,
> 
>                 Well, I am happy to know that it is not Bacula related
>                 :-)
> 
> Well, it's not solved yet.  This just in.
> 
> 11-Feb 21:11 bacula-sd-01-sd JobId 256374:
>             Elapsed time=03:20:24, Transfer rate=29.63 M Bytes/second
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>             bsock.c:375 Socket has errors=1 on call to client:Hostname
>             not found:9103
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
>             fd_cmds.c:142 Read data not accepted
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>             bsock.c:375 Socket has errors=1 on call to client:Hostname
>             not found:9103
> 
> 9103?  That must be an SD for which someone
>             cannot figure out the hostname based on the IP address.
> 
> Would it not be easier to list the IP address
>             when the hostname cannot be found? That would give a
>             starting point.
> 
> Let's go to the SD in question: bacula-sd-01-sd
>             which is at bacula-sd-01.int.unixathome.org (the
>             Read Storage listed below). From there,
> 
> I try to resolve the write SD (tape01 at tape01.int.unixathome.org),
>             which is what this message must pertain to.
> 
> $ dig +short tape01.int.unixathome.org
>             10.55.0.103
> 
> $ dig +short -x 10.55.0.103
>             tape01.int.unixathome.org.
> 
> That looks OK.  Perhaps I misunderstand which
>             hostname it's trying to provide.
> 
> Is there something else I should verify with
>               respect to DNS in this instance?
> 
> I looked at the code: The message comes from
>             line 374 of src/lib/bsock.c
> 
> The 'Hostname not found' comes from line 646
>             of src/lib/address_conf.c
> 
> The code invokes getnameinfo(3) and I looked up
>             that info .http://www.unix.com/man-page/FreeBSD/3/getnameinfo/
> 
> I'm happy to patch with debugging code for this,
>             but I'm out of time this weekend.
> 
> The full job output is here:
> 
> 11-Feb 17:50 bacula-dir JobId 256374: Warning:
>             FileSet MD5 digest not found.
>             11-Feb 17:50 bacula-dir JobId 256374: Copying using
>             JobId=255999 Job=knew_jail_snapshots.2017-02-05_03.04.02_47
>             11-Feb 17:50 bacula-dir JobId 256374: Bootstrap records
>             written to
>             /usr/local/bacula/working/bacula-dir.restore.10.bsr
>             11-Feb 17:50 bacula-dir JobId 256374: Start Copying JobId
>             256374, Job=CopyToTape-Full-LTO4.2017-02-11_17.50.17_22
>             11-Feb 17:50 bacula-dir JobId 256374: Using Device
>             "vDrive-6" to read.
>             11-Feb 17:50 bacula-sd-01-sd JobId 256374: Ready to read
>             from volume "FullAuto-1377" on file device "vDrive-6"
>             (/usr/local/bacula/volumes).
> 
>             lots of reading skipped.....
> 
>             11-Feb 21:00 bacula-sd-01-sd JobId 256374: Ready to read
>             from volume "FullAuto-1514" on file device "vDrive-6"
>             (/usr/local/bacula/volumes).
>             11-Feb 21:00 bacula-sd-01-sd JobId 256374: Forward spacing
>             Volume "FullAuto-1514" to file:block 0:214.
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>             bsock.c:453 Wrote 65540 bytes to client:Hostname not
>             found:9103, but only 32768 accepted.
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
>             read.c:284 Error sending to File daemon. ERR=Broken pipe
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Elapsed
>             time=03:20:24, Transfer rate=29.63 M Bytes/second
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>             bsock.c:375 Socket has errors=1 on call to client:Hostname
>             not found:9103
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error:
>             fd_cmds.c:142 Read data not accepted
>             11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error:
>             bsock.c:375 Socket has errors=1 on call to client:Hostname
>             not found:9103
>             11-Feb 21:11 bacula-dir JobId 256374: Error: Bacula
>             bacula-dir 7.4.4 (28Sep16):
>              Build OS:               amd64-portbld-freebsd11.0 freebsd
>             11.0-RELEASE-p6
>              Prev Backup JobId:      255999
>              Prev Backup Job:
>                    knew_jail_snapshots.2017-02-05_03.04.02_47
>              New Backup JobId:       256375
>              Current JobId:          256374
>              Current Job:
>                        CopyToTape-Full-LTO4.2017-02-11_17.50.17_22
>              Backup Level:           Full
>              Client:                 crey-fd
>              FileSet:                "EmptyCopyToTape" 2011-02-20
>             20:53:31
>              Read Pool:              "FullFile" (From Command input)
>              Read Storage:           "bacula-sd-01-file" (From Pool
>             resource)
>              Write Pool:             "FullsLTO4" (From Command input)
>              Write Storage:          "tape01" (From Command input)
>              Catalog:                "MyCatalog" (From Client resource)
>              Start time:             11-Feb-2017 17:50:50
>              End time:               11-Feb-2017 21:11:14
>              Elapsed time:           3 hours 20 mins 24 secs
>              Priority:               10
>              SD Files Written:       404,305
>              SD Bytes Written:       356,347,133,351 (356.3 GB)
>              Rate:                   29636.3 KB/s
>              Volume name(s):         000013L4
>              Volume Session Id:      143
>              Volume Session Time:    1486562541
>              Last Volume Bytes:      1,187,871,455,232 (1.187 TB)
>              SD Errors:              3
>              SD termination status:  Error
>              Termination:            *** Copying Error ***
> 
> --
> 
> Dan Langille - BSDCan / PGCon
> 
> dan AT langille DOT org
> 
>                 Kern
> 
>                 On 02/10/2017 09:46 PM, Dan Langille wrote:
> 
> On Feb 10, 2017, at 12:55 PM, Dan
>                       Langille <dan AT langille DOT org>
>                       wrote:
> 
> On Feb 10, 2017, at 12:20 PM,
>                               Kern Sibbald <kern AT sibbald DOT com>
>                               wrote:
> 
> Hello,
> 
>                                   I suspect that this is a problem with
>                                   the FreeBSD networking
>                                   implementation.  If I remember right
>                                   on FreeBSD, when doing name lookups,
>                                   if the packet size is not *exactly*
>                                   what FreeBSD wants, it fails the
>                                   call.  On Linux and other machines
>                                   (Solaris, Mac), as long as the packet
>                                   size is equal or greater than what is
>                                   needed the OS call succeeds.  If I am
>                                   not mistaken, Bacula allocates space
>                                   for the larger of IPv4 and IPv6 (which
>                                   is always IPv6), and so if you are
>                                   using an IPv4 network, Bacula may send
>                                   OS calls with a packet size larger
>                                   than actually required.
> 
> I just spoke with a FreeBSD
>                             developer.  They are unaware of anything
>                             special in the FreeBSD ports tree for
>                             patching FreeBSD when it comes to doing name
>                             lookups.  Specifically, gethostby*(),
>                             getipnodeby*() just work...
> 
> If you can reproduce/encounter a
>                             situation which fails, we will look at it
>                             and fix it.  In short, I do not think this
>                             is an issue with the FreeBSD networking
>                             implementation.
> 
> I suspect it's a local DNS
>                             misconfiguration on one of my hosts.  Which
>                             ones, I don't know yet.  Your first post
>                             mentioned the SDs, so I checked them. They
>                             seem OK now.  I will verify them again if I
>                             see them again.
> 
> For testing purposes, I will use
>                             dig +short and verify that all of these FQDN
>                             resolve on the FD, the SD, the director, and
>                             from from a fourth host:
> 
> The FQDN of the FD (even though
>                             it is not used in a Copy job)
> 
> The FQDN of the read SD
> 
> The FQDN of the write SD
> 
> The FQDN of the Director
> 
> I will also check that the PTR
>                             record for the A record also resolves back
>                             to the FQDN.
> 
> If this is
>                                   the case, I would consider it a
>                                   FreeBSD bug.  For me to fix it is a
>                                   bit complicated, because I need to
>                                   know exactly what call is failing and
>                                   the values that FreeBSD wants.  By the
>                                   way, it is possible this is already
>                                   fixed in the Enterprise version where
>                                   FreeBSD is supported too.  If that is
>                                   the case, in my next round of
>                                   backporting to start next week, it
>                                   will get fixed.
> 
>                           If this was the case, I'd expect apps on
>                           FreeBSD to be failing everywhere.  They
>                           aren't.  I've never patched anything for DNS
>                           issues either.
> 
> I suspect it's more likely to be
>                           an issue on one of the Bacula nodes in
>                           question (SD or FD) where there is a local DNS
>                           issue.
> 
> For documentation: I found that one of my
>                   three name servers was not correctly resolving PTR
>                   records.  That is:
> 
>  $ dig +short -x 10.52.0.1 @10.55.0.1
> 
> Whereas the other two nameservers:
> 
> $ dig +short -x 10.52.0.1 @10.55.0.13
>                     bast.int.unixathome.org.
> 
> $ dig +short -x
>                           10.52.0.1 @10.55.0.73
>                           bast.int.unixathome.org.
> 
> That issue since
>                           been fixed.  It took a while to track down.
>                            That nameserver is running pfSense, using
>                           their GUI.  The issue is right there on the
>                           page:
> 
> Note:
>                           IN-ADDR.ARPA will be automaticaly included in
>                           config files when reverse zone option is
>                           checked.
> 
> The checkbox is
>                           four controls farther down the page.
> 
> The expected zone
>                           0.55.10.in-addr.arpa was actually zone
>                           0.55.10.in-addr.arpa.in-addr.arpa
> 
> I'll keep an eye
>                           on the Bacula logs to see if this shows up
>                           again.
> 
> Thank you.
> 
> --
> 
> Dan Langille - BSDCan / PGCon
> 
> dan AT langille DOT org
> 
> 

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users