Amanda-Users

Amanada Data Timeout Problem with CygWin

2003-04-04 10:22:56
Subject: Amanada Data Timeout Problem with CygWin
From: Brian White <bcwhite AT precidia DOT com>
To: Amanda Users <amanda-users AT amanda DOT org>
Date: Fri, 04 Apr 2003 08:58:49 -0500
I might be making progress; I'm not sure.

For those who haven't been following...  The amanda client under cygwin will
reset the data connection without sending the final "goodbye" message and
thus the server ignores all the backup data.

To fix this, I removed the "aclose" of the various sockets from the
"start_backup" functions and added them to after the "parse_backup_messages"
in the main "sendbackup.c" file.  The theory is that by not doing a close
from within the main program, then there would be no RST packet going
out.  This has worked!  However...

When the client runs, the server gets the following:

/-- watertown. /home/watertown lev 1 FAILED [data timeout]
sendbackup: start [watertown.ott.precidia.com:/home/watertown level 1]
sendbackup: info BACKUP=/usr/bin/tar
sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc |/usr/bin/tar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
| Total bytes written: 10240 (10kB, ?B/s)
\--------


The client on watertown things it ran just fine:

sendbackup: debug 1 pid 312 ruid 0 euid 0: start at Thu Apr  3 20:24:58 2003
/usr/local/lib/amanda/sendbackup: version 2.4.3
  parsed request as: program `GNUTAR'
                     disk `/home/watertown'
                     device `/home/watertown'
                     level 1
                     since 2003:4:2:1:20:34
                     options 
`|;bsd-auth;compress-fast;index;exclude-file=^(#|Backup.of)|.(bak|tmp|fla|fst|hex|lib|lst|m
51|wav|mp3|mpg|mpeg|avi|mov|obj|omf|prn|swp)$|~$;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.014: stream_server: waiting for connection: 0.0.0.0.546
sendbackup: time 0.016: stream_server: waiting for connection: 0.0.0.0.547
sendbackup: time 0.016: stream_server: waiting for connection: 0.0.0.0.548
sendbackup: time 0.018: waiting for connect on 546, then 547, then 548
sendbackup: time 0.030: stream_accept: connection from 10.0.1.2.4285
sendbackup: time 0.032: stream_accept: connection from 10.0.1.2.4286
sendbackup: time 0.034: stream_accept: connection from 10.0.1.2.4287
sendbackup: time 0.035: got all connections
sendbackup: time 0.036: spawning /usr/bin/gzip in pipeline
sendbackup: argument list: /usr/bin/gzip --fast
sendbackup-gnutar: time 0.085: pid 3644: /usr/bin/gzip --fast
sendbackup-gnutar: time 0.090: doing level 1 dump as listed-incremental from 
/var/lib/amanda/gnutar-lists/watertown.ott.
precidia.com_home_watertown_0 to 
/var/lib/amanda/gnutar-lists/watertown.ott.precidia.com_home_watertown_1.new
sendbackup-gnutar: time 0.104: doing level 1 dump from date: 2003-04-01  
1:25:14 GMT
sendbackup: time 0.191: spawning /usr/local/lib/amanda/runtar in pipeline
sendbackup: argument list: gtar --create --file - --directory /home/watertown 
--listed-incremental /var/lib/amanda/gnuta
r-lists/watertown.ott.precidia.com_home_watertown_1.new --sparse 
--ignore-failed-read --totals --exclude-from /tmp/amand
a/sendbackup._home_watertown.20030403202458.exclude .
sendbackup: time 0.271: started index creator: "/usr/bin/tar -tf - 2>/dev/null 
| sed -e 's/^\.//'"
sendbackup-gnutar: time 0.284: /usr/local/lib/amanda/runtar: pid 700
sendbackup: time 0.673:  53:    size(|): Total bytes written: 10240 (10kB, ?B/s)
sendbackup: time 0.746: index created successfully

Well, somewhat fine, anyway.  I was expecting more log messages at the end.


I captured the entire thing with tcpdump (server=10.0.1.2; watertown=10.0.3.10):

20:07:04.028278 10.0.1.2.861 > 10.0.3.10.10080:  udp 450 (DF)
20:07:04.272743 10.0.3.10.10080 > 10.0.1.2.861:  udp 50
20:07:05.387130 10.0.3.10.10080 > 10.0.1.2.861:  udp 111
20:07:05.401699 10.0.1.2.861 > 10.0.3.10.10080:  udp 50 (DF)
20:25:57.682163 10.0.1.2.869 > 10.0.3.10.10080:  udp 325 (DF)
20:25:57.922821 10.0.3.10.10080 > 10.0.1.2.869:  udp 50
20:25:58.238240 10.0.3.10.10080 > 10.0.1.2.869:  udp 232
20:25:58.238493 10.0.1.2.869 > 10.0.3.10.10080:  udp 50 (DF)
20:25:58.239760 10.0.1.2.4285 > 10.0.3.10.546: S 1481763151:1481763151(0) win 
5840 <mss 1460,sackOK,timestamp 263338811 
20:25:58.240392 10.0.3.10.546 > 10.0.1.2.4285: S 3382986306:3382986306(0) ack 
1481763152 win 17520 <mss 1460,nop,wscale 
20:25:58.240469 10.0.1.2.4285 > 10.0.3.10.546: . ack 1 win 5840 
<nop,nop,timestamp 263338811 0> (DF)
20:25:58.241478 10.0.1.2.4286 > 10.0.3.10.547: S 1484517405:1484517405(0) win 
5840 <mss 1460,sackOK,timestamp 263338811 
20:25:58.242113 10.0.3.10.547 > 10.0.1.2.4286: S 3383044113:3383044113(0) ack 
1484517406 win 17520 <mss 1460,nop,wscale 
20:25:58.242204 10.0.1.2.4286 > 10.0.3.10.547: . ack 1 win 5840 
<nop,nop,timestamp 263338811 0> (DF)
20:25:58.243212 10.0.1.2.4287 > 10.0.3.10.548: S 1477361282:1477361282(0) win 
5840 <mss 1460,sackOK,timestamp 263338811 
20:25:58.243850 10.0.3.10.548 > 10.0.1.2.4287: S 3383096864:3383096864(0) ack 
1477361283 win 17520 <mss 1460,nop,wscale 
20:25:58.243949 10.0.1.2.4287 > 10.0.3.10.548: . ack 1 win 5840 
<nop,nop,timestamp 263338811 0> (DF)
20:25:58.251751 10.0.3.10.547 > 10.0.1.2.4286: P 1:72(71) ack 1 win 17520 
<nop,nop,timestamp 20225587 263338811> (DF)
20:25:58.251865 10.0.1.2.4286 > 10.0.3.10.547: . ack 72 win 5840 
<nop,nop,timestamp 263338812 20225587> (DF)
20:25:58.337071 10.0.3.10.547 > 10.0.1.2.4286: P 72:109(37) ack 1 win 17520 
<nop,nop,timestamp 20225588 263338812> (DF)
20:25:58.337158 10.0.1.2.4286 > 10.0.3.10.547: . ack 109 win 5840 
<nop,nop,timestamp 263338821 20225588> (DF)
20:25:58.340375 10.0.3.10.547 > 10.0.1.2.4286: P 109:236(127) ack 1 win 17520 
<nop,nop,timestamp 20225588 263338821> (DF
20:25:58.340471 10.0.1.2.4286 > 10.0.3.10.547: . ack 236 win 5840 
<nop,nop,timestamp 263338821 20225588> (DF)
20:25:58.889364 10.0.3.10.547 > 10.0.1.2.4286: P 236:278(42) ack 1 win 17520 
<nop,nop,timestamp 20225594 263338821> (DF)
20:25:58.889495 10.0.1.2.4286 > 10.0.3.10.547: . ack 278 win 5840 
<nop,nop,timestamp 263338876 20225594> (DF)
20:25:58.953837 10.0.3.10.548 > 10.0.1.2.4287: P 1:110(109) ack 1 win 17520 
<nop,nop,timestamp 20225594 263338811> (DF)
20:25:58.953935 10.0.1.2.4287 > 10.0.3.10.548: . ack 110 win 5840 
<nop,nop,timestamp 263338882 20225594> (DF)
20:25:58.980882 10.0.3.10.546 > 10.0.1.2.4285: P 1:472(471) ack 1 win 17520 
<nop,nop,timestamp 20225595 263338811> (DF)
20:25:58.981018 10.0.1.2.4285 > 10.0.3.10.546: . ack 472 win 6432 
<nop,nop,timestamp 263338885 20225595> (DF)
20:55:59.061126 10.0.1.2.4287 > 10.0.3.10.548: F 1:1(0) ack 110 win 5840 
<nop,nop,timestamp 263518893 20225594> (DF)
20:55:59.064360 10.0.3.10.548 > 10.0.1.2.4287: . ack 2 win 17520 
<nop,nop,timestamp 20243596 263518893> (DF)
20:55:59.121886 10.0.1.2.4286 > 10.0.3.10.547: F 1:1(0) ack 278 win 5840 
<nop,nop,timestamp 263518899 20225594> (DF)
20:55:59.122003 10.0.1.2.4285 > 10.0.3.10.546: F 1:1(0) ack 472 win 6432 
<nop,nop,timestamp 263518899 20225595> (DF)
20:55:59.122494 10.0.3.10.547 > 10.0.1.2.4286: . ack 2 win 17520 
<nop,nop,timestamp 20243596 263518899> (DF)
20:55:59.123000 10.0.3.10.546 > 10.0.1.2.4285: . ack 2 win 17520 
<nop,nop,timestamp 20243596 263518899> (DF)


So...  Why does the server close (FIN) all three connections only 1 second
after opening them?  I can find no messages in the server logs as to why
this might have happened.

                                          Brian
                                 ( bcwhite AT precidia DOT com )

-------------------------------------------------------------------------------
                    No man dies except he who has not lived.

<Prev in Thread] Current Thread [Next in Thread>