Amanda-Users

Problem with a multi-homed client: amcheck: timeout waiting, for ACK: Receive packet from unknown source

2008-05-13 03:34:44
Subject: Problem with a multi-homed client: amcheck: timeout waiting, for ACK: Receive packet from unknown source
From: "Christopher Brooks" <cxh AT eecs.berkeley DOT edu>
To: amanda-users AT amanda DOT org
Date: Mon, 12 May 2008 23:27:37 -0700
I have a Solaris 10 client that has multiple virtual ethernet
interfaces.  In other words, it has one physical wire, but there
are multiple virtual interfaces assigned to that wire.

The problem is that amcheck to this multi-homed client fails with:

WARNING: AA.BB.CC.edu: selfcheck request failed: timeout waiting for ACK

Interestingly, this worked in Amanda-2.4.5, but now fails in
Amanda-2.6.0.

BTW - The primary reason I'm upgrading is because under Amanda-2.4.5,
killpgrp is periodically killing extra processes and hosing one of my
machines.  Unfortunately, when the machine goes down, it erases
/tmp/amanda, so I can't get at the amanda logs.  With 2.6.0, I'm
putting the amanda files in a separate directory.  I have pretty good
evidence of this bug because with account processing, I can see when
killpgrp runs, it kills a heartbeat script I hacked in which runs
every 10 seconds.

Anyway, back to the multi-homed problem.

I have a tapeserver machine and then a separate machine with
four IP addresses.
The XX.YY.ZZ.30 address is the primary address
XX.YY.ZZ.31 is a virtual address.

ipconfig -a looks like
--start--
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 
index 1
    inet 127.0.0.1 netmask ff000000 
bge0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 2
  inet XX.YY.ZZ.30 netmask ffffff00 broadcast XX.YY.ZZ.255
    ether 0:3:ba:c6:e7:a9 
bge0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
       inet XX.YY.ZZ.31 netmask ffffff00 broadcast XX.YY.ZZ.255
bge0:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
 inet XX.YY.ZZ.32 netmask ffffff00 broadcast XX.YY.ZZ.255
bge0:3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
 inet XX.YY.ZZ.34 netmask ffffff00 broadcast XX.YY.ZZ.255
--end--


What happens is that amcheck traffic from the tapeserver goes to
the .30 address, but the traffic returns from the .31 address.
This causes amcheck to fail.


I can see this in the debug log file on the tape server:

--start--
1210631008.444594: amcheck: pid 13583 ruid 201 euid 201: start at Mon May 12 1
5:23:28 2008
1210631008.451774: amcheck: pid 13583 ruid 201 euid 201: rename at Mon May 12 1
5:23:28 2008
1210631008.462034: amcheck-clients: security_getdriver(name=BSD) returns ff2db3
cc
1210631008.462315: amcheck-clients: security_handleinit(handle=37d48, driver=ff
2db3cc (BSD))
1210631008.471146: amcheck-clients: bind_portrange2: Try  port 813: Available -
 Success
1210631008.471258: amcheck-clients: dgram_bind: socket 3 bound to 0.0.0.0.813
1210631008.471808: amcheck-clients: dgram_send_addr(addr=37d68, dgram=ff2e0cd4)
1210631008.471833: amcheck-clients: (sockaddr_in *)37d68 = { 2, 10080, 
XX.YY.ZZ.30 }



The last lines above send out to XX.YY.ZZ.30 


1210631008.471849: amcheck-clients: dgram_send_addr: ff2e0cd4->socket = 3
1210631008.534349: amcheck-clients: dgram_recv(dgram=ff2e0cd4, timeout=0, froma
ddr=ff2f0cc0)
1210631008.534414: amcheck-clients: (sockaddr_in *)ff2f0cc0 = { 2, 10080, 128.3
2.244.31 }
1210631008.534520: amcheck-clients: Receive packet from unknown source121063100
9.560074: amcheck-clients: dgram_recv(dgram=ff2e0cd4, timeout=0, fromaddr=ff2f0
cc0)

--end--

However, the tapehost is receiving from XX.YY.ZZ.31:

Running snoop on the tapehost, I can see it sends packets to
the .30

tapehost -> XX.YY.ZZ.30 UDP D=10080 S=813 LEN=133
tapehost -> XX.YY.ZZ.30 UDP D=10080 S=813 LEN=133
tapehost -> XX.YY.ZZ.30 UDP D=10080 S=813 LEN=133

However, running snoop on the multi-homed, machine, I can see that
it receives on .30 and sends on .31

tapehost -> XX.YY.ZZ.30 UDP D=10080 S=813 LEN=133
XX.YY.ZZ.31 -> tapehost UDP D=813 S=10080 LEN=99
XX.YY.ZZ.31 -> tapehost UDP D=813 S=10080 LEN=99
tapehost -> XX.YY.ZZ.30 UDP D=10080 S=813 LEN=133
XX.YY.ZZ.31 -> tapehost UDP D=813 S=10080 LEN=99
XX.YY.ZZ.31 -> tapehost UDP D=813 S=10080 LEN=99
XX.YY.ZZ.31 -> tapehost UDP D=813 S=10080 LEN=99
XX.YY.ZZ.31 -> tapehost UDP D=813 S=10080 LEN=99

The "Receive packet from unknown source" message comes from
common-src/security-util.c udp_netfd_read_callback(): 

    /*
     * If we didn't find a handle, then check for a new incoming packet.
     * If no accept handler was setup, then just return.
     */
    if (udp->accept_fn == NULL) {
        dbprintf(_("Receive packet from unknown source"));
        return;
    }

I think the problem is that the kernel is sending the ack on whatever
interface it wants to, which is probably permitted.  An explanation
can be found at
"Socket Binding on a Multihomed Host"
http://blogs.msdn.com/zhengpei/archive/2007/04/25/socket-binding-on-a-multihomed-host.aspx

See also:

* Re: Binding amanda to specific interface (2003)
  (http://www.mail-archive.com/amanda-users AT amanda DOT org/msg18070.html)
* Fwd: amrcover and nslookups that resolve hostname to more than
  one IP_ADDR (mult-homed servers) (2006)
  (http://www.adsm.org/lists/html/Amanda-Users/2006-03/msg00354.html)

Chapter 18. Using Amanda (http://www.amanda.org/docs/using.html
   says:

     If the tape server has multiple network connections, an amanda.conf
     interface section may be set up for each one and clients allocated
     to a particular interface with field five of the disklist.
     Individual interfaces take precedence over the general netusage
     bandwidth limit and follow the same guidelines described above in
     "Configuring Amanda": the limit is imposed when deciding whether to
     start a dump, but once a dump starts, Amanda lets underlying
     network components do any throttling. Individual Amanda interface
     definitions do not control which physical connection is used. That
     is left up to the operating system network software. While it's
     common to give an Amanda interface definition the same name as a
     physical connection, e.g. le0, it might be better to use logical
     names such as back-door-atm to avoid confusion.

However, the problem here is that I have a _client_ that is
multihomed, not a tape server.

Status Summary for June 2007 
(http://tech.groups.yahoo.com/group/amanda-users/message/63330)
says

     * Ladislav Michnovic alerted amanda-hackers to a regression in SUSE
     Linux, reported by Matthias Andree, relating to multihomed
     machines. The issue has stalled without any feedback from Matthias.

   Which refers to 
"some regressions in 2.5.2p1,"
(http://groups.yahoo.com/group/amanda-hackers/message/5462)
which mentions 
:Bug 233098 - kernel or amanda regression: wrong source IP addr (was OK in SL 
10.0):
(https://bugzilla.novell.com/show_bug.cgi?id=233098)
which mentions
"ACK timeout on multi-homed clients Re: ACK timeout issue I wrote about"
(http://www.mail-archive.com/amanda-users AT amanda DOT org/msg36873.html)


My workaround is to refer to change disklist so that I refer to the
multi-homed machine by the name with which it responds.  This is a
hack, but it gets me by.

One clue is that netstat -rn shows traffic on the .31 interface.
but not really on the .30 interface.  I'm not sure about this
though. I just noticed it:

--start--
Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface 
-------------------- -------------------- ----- ----- ---------- --------- 
default              XX.YY.ZZ.1         UG        1      10552 bge0      
XX.YY.ZZ.0         XX.YY.ZZ.31        U         1       6793 bge0:1    
XX.YY.ZZ.0         XX.YY.ZZ.32        U         1          0 bge0:2    
XX.YY.ZZ.0         XX.YY.ZZ.34        U         1          0 bge0:3    
XX.YY.ZZ.0         XX.YY.ZZ.30        U         1          0 bge0      
224.0.0.0            XX.YY.ZZ.30        U         1          0 bge0      
127.0.0.1            127.0.0.1            UH      114      52225 lo0       
--end--

I realize this is a somewhat degenerate case, but does anyone have
any ideas?

_Christopher

Christopher Brooks (cxh at eecs berkeley edu) University of California
Chess Executive Director                      US Mail: 337 Cory Hall #1774
Programmer/Analyst Chess/Ptolemy/Trust        Berkeley, CA 94720-1774
ph: 510.643.9841 fax:510.642.2718             (office: 400A Cory)
home: (F-Tu) 707.665.0131 (W-F) 510.655.5480  

<Prev in Thread] Current Thread [Next in Thread>