Re: [Bacula-users] Hostname not found
2017-02-11 17:09:55
Hello Dan,
Well, I am happy to know that it is not Bacula related :-)
Well, it's not solved yet. This just in.
11-Feb 21:11 bacula-sd-01-sd JobId 256374: Elapsed time=03:20:24, Transfer rate=29.63 M Bytes/second 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: bsock.c:375 Socket has errors=1 on call to client:Hostname not found:9103 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error: fd_cmds.c:142 Read data not accepted 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: bsock.c:375 Socket has errors=1 on call to client:Hostname not found:9103
9103? That must be an SD for which someone cannot figure out the hostname based on the IP address.
Would it not be easier to list the IP address when the hostname cannot be found? That would give a starting point.
That looks OK. Perhaps I misunderstand which hostname it's trying to provide.
Is there something else I should verify with respect to DNS in this instance?
I looked at the code: The message comes from line 374 of src/lib/bsock.c
The 'Hostname not found' comes from line 646 of src/lib/address_conf.c
I'm happy to patch with debugging code for this, but I'm out of time this weekend.
The full job output is here:
11-Feb 17:50 bacula-dir JobId 256374: Warning: FileSet MD5 digest not found. 11-Feb 17:50 bacula-dir JobId 256374: Copying using JobId=255999 Job=knew_jail_snapshots.2017-02-05_03.04.02_47 11-Feb 17:50 bacula-dir JobId 256374: Bootstrap records written to /usr/local/bacula/working/bacula-dir.restore.10.bsr 11-Feb 17:50 bacula-dir JobId 256374: Start Copying JobId 256374, Job=CopyToTape-Full-LTO4.2017-02-11_17.50.17_22 11-Feb 17:50 bacula-dir JobId 256374: Using Device "vDrive-6" to read. 11-Feb 17:50 bacula-sd-01-sd JobId 256374: Ready to read from volume "FullAuto-1377" on file device "vDrive-6" (/usr/local/bacula/volumes).
lots of reading skipped.....
11-Feb 21:00 bacula-sd-01-sd JobId 256374: Ready to read from volume "FullAuto-1514" on file device "vDrive-6" (/usr/local/bacula/volumes). 11-Feb 21:00 bacula-sd-01-sd JobId 256374: Forward spacing Volume "FullAuto-1514" to file:block 0:214. 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: bsock.c:453 Wrote 65540 bytes to client:Hostname not found:9103, but only 32768 accepted. 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error: read.c:284 Error sending to File daemon. ERR=Broken pipe 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Elapsed time=03:20:24, Transfer rate=29.63 M Bytes/second 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: bsock.c:375 Socket has errors=1 on call to client:Hostname not found:9103 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error: fd_cmds.c:142 Read data not accepted 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: bsock.c:375 Socket has errors=1 on call to client:Hostname not found:9103 11-Feb 21:11 bacula-dir JobId 256374: Error: Bacula bacula-dir 7.4.4 (28Sep16): Build OS: amd64-portbld-freebsd11.0 freebsd 11.0-RELEASE-p6 Prev Backup JobId: 255999 Prev Backup Job: knew_jail_snapshots.2017-02-05_03.04.02_47 New Backup JobId: 256375 Current JobId: 256374 Current Job: CopyToTape-Full-LTO4.2017-02-11_17.50.17_22 Backup Level: Full Client: crey-fd FileSet: "EmptyCopyToTape" 2011-02-20 20:53:31 Read Pool: "FullFile" (From Command input) Read Storage: "bacula-sd-01-file" (From Pool resource) Write Pool: "FullsLTO4" (From Command input) Write Storage: "tape01" (From Command input) Catalog: "MyCatalog" (From Client resource) Start time: 11-Feb-2017 17:50:50 End time: 11-Feb-2017 21:11:14 Elapsed time: 3 hours 20 mins 24 secs Priority: 10 SD Files Written: 404,305 SD Bytes Written: 356,347,133,351 (356.3 GB) Rate: 29636.3 KB/s Volume name(s): 000013L4 Volume Session Id: 143 Volume Session Time: 1486562541 Last Volume Bytes: 1,187,871,455,232 (1.187 TB) SD Errors: 3 SD termination status: Error Termination: *** Copying Error ***
-- Dan Langille - BSDCan / PGCon
Kern
On 02/10/2017 09:46 PM, Dan Langille wrote:
Hello,
I suspect that this is a problem with the
FreeBSD networking implementation. If I
remember right on FreeBSD, when doing name
lookups, if the packet size is not *exactly*
what FreeBSD wants, it fails the call. On Linux
and other machines (Solaris, Mac), as long as
the packet size is equal or greater than what is
needed the OS call succeeds. If I am not
mistaken, Bacula allocates space for the larger
of IPv4 and IPv6 (which is always IPv6), and so
if you are using an IPv4 network, Bacula may
send OS calls with a packet size larger than
actually required.
I just spoke with a FreeBSD developer.
They are unaware of anything special in the FreeBSD
ports tree for patching FreeBSD when it comes to doing
name lookups. Specifically, gethostby*(),
getipnodeby*() just work...
If you can reproduce/encounter a situation
which fails, we will look at it and fix it. In short,
I do not think this is an issue with the FreeBSD
networking implementation.
I suspect it's a local DNS
misconfiguration on one of my hosts. Which ones, I
don't know yet. Your first post mentioned the SDs, so
I checked them. They seem OK now. I will verify them
again if I see them again.
For testing purposes, I will use dig
+short and verify that all of these FQDN resolve on
the FD, the SD, the director, and from from a fourth
host:
The FQDN of the FD (even though it is not
used in a Copy job)
The FQDN of the read SD
The FQDN of the write SD
The FQDN of the Director
I will also check that the PTR record for
the A record also resolves back to the FQDN.
If this is the case,
I would consider it a FreeBSD bug. For me to
fix it is a bit complicated, because I need to
know exactly what call is failing and the values
that FreeBSD wants. By the way, it is possible
this is already fixed in the Enterprise version
where FreeBSD is supported too. If that is the
case, in my next round of backporting to start
next week, it will get fixed.
If this was the case, I'd expect apps on FreeBSD to be
failing everywhere. They aren't. I've never patched
anything for DNS issues either.
I suspect it's more likely to be an issue on
one of the Bacula nodes in question (SD or FD) where
there is a local DNS issue.
For documentation: I found that one of my three name servers
was not correctly resolving PTR records. That is:
$ dig +short -x 10.52.0.1 @10.55.0.1
Whereas the other two nameservers:
$ dig +short -x 10.52.0.1 @10.55.0.13
bast.int.unixathome.org.
That
issue since been fixed. It took a while to track down.
That nameserver is running pfSense, using their GUI.
The issue is right there on the page:
Note:
IN-ADDR.ARPA will be automaticaly included in config
files when reverse zone option is checked.
The
checkbox is four controls farther down the page.
The
expected zone 0.55.10.in-addr.arpa was actually zone
0.55.10.in-addr.arpa.in-addr.arpa
I'll
keep an eye on the Bacula logs to see if this shows up
again.
Thank
you.
--
Dan Langille - BSDCan / PGCon
|
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|