Amanda-Users

Re: Amanda TCP Reset Problem with CygWin

2003-04-03 12:51:51
Subject: Re: Amanda TCP Reset Problem with CygWin
From: Brian White <bcwhite AT precidia DOT com>
To: Amanda Users <amanda-users AT amanda DOT org>
Date: Thu, 03 Apr 2003 10:35:22 -0500
> /-- watertown. /home/watertown lev 0 STRANGE
> sendbackup: start [watertown.ott.precidia.com:/home/watertown level 0]
> sendbackup: info BACKUP=/usr/bin/tar
> sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc |/usr/bin/tar -f... -
> sendbackup: info COMPRESS_SUFFIX=.gz
> sendbackup: info end
> | Total bytes written: 20480 (20kB, ?B/s)
> ?
> ? gzip: stdout: Cannot send after transport endpoint shutdown
> ??error [compress returned 1]? dumper: strange [missing size line from 
> sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \--------
> 
> The same error message is on the client (naturally).  So, it looks like gzip
> is still trying to send to the socket when sendbackup is closing the socket.
> 
> My guess is that the path was closed after it had been dup'd and so an
> extra close wouldn't break things but a shutdown would.  I'm going to try
> to figure out if that is the case.

So...  I went in to sendbackup-gnutar.c and #if'd out some lines:

    /* close the write ends of the pipes */

    aclose(dumpin);
    aclose(dumpout);
#if 0 // moved to sendbackup.c
    aclose(dataf);
    aclose(mesgf);
    if (options->createindex)
        aclose(indexf);
#endif

Then I went in to sendbackup.c and added the same there:

    program->start_backup(g_options->hostname, disk, amdevice, level, dumpdate,
                          dataf, mesgpipe[1], indexf);
    parse_backup_messages(mesgpipe[0]);

#if 1 // moved from sendbackup-*.c
    dbprintf(("%s: closing paths 
(%d,%d,%d)\n",debug_prefix(NULL),indexf,mesgf,dataf));
    shutdown(dataf,2);
    aclose(dataf);
    shutdown(mesgf,2);
    aclose(mesgf);
    if (!interactive && options->createindex) {
        shutdown(indexf,2);
        aclose(indexf);
    }
#endif

This way, there is no longer a local "aclose(dataf)" and "aclose(mesgf)"
command while there are external processes still writing to that socket.
Things have improved, but still don't look correct.  Here's the message
from the server:

/-- watertown. /home/watertown lev 1 FAILED [data timeout]
sendbackup: start [watertown.ott.precidia.com:/home/watertown level 1]
sendbackup: info BACKUP=/usr/bin/tar
sendbackup: info RECOVER_CMD=/usr/bin/gzip -dc |/usr/bin/tar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
| Total bytes written: 10240 (10kB, ?B/s)
\--------

Unfortunately, I didn't have tcpdump running to learn more about exactly
what happened.  Tonight, I guess.  What perplexes me, though, is that
the socket shutdown code never seemed to be run becuase there was nothing
in the sendbackup log with the "closing paths (%d,%d,%d)" message I added:

sendbackup: debug 1 pid 1332 ruid 0 euid 0: start at Wed Apr  2 20:23:45 2003
/usr/local/lib/amanda/sendbackup: version 2.4.3
  parsed request as: program `GNUTAR'
                     disk `/home/watertown'
                     device `/home/watertown'
                     level 1
                     since 2003:4:2:1:20:34
                     options 
`|;bsd-auth;compress-fast;index;exclude-file=^(#|Backup.of)|.(bak|tmp|fla|fst|hex|lib|lst|m
51|wav|mp3|mpg|mpeg|avi|mov|obj|omf|prn|swp)$|~$;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.015: stream_server: waiting for connection: 0.0.0.0.597
sendbackup: time 0.016: stream_server: waiting for connection: 0.0.0.0.598
sendbackup: time 0.017: stream_server: waiting for connection: 0.0.0.0.599
sendbackup: time 0.018: waiting for connect on 597, then 598, then 599
sendbackup: got data path #0
sendbackup: time 0.035: stream_accept: connection from 10.0.1.2.3442
sendbackup: time 0.039: stream_accept: connection from 10.0.1.2.3443
sendbackup: got mesg path #7
sendbackup: time 0.043: stream_accept: connection from 10.0.1.2.3444
sendbackup: time 0.043: got all connections
sendbackup: time 0.044: spawning /usr/bin/gzip in pipeline
sendbackup: argument list: /usr/bin/gzip --fast
sendbackup-gnutar: time 0.094: pid 3564: /usr/bin/gzip --fast
sendbackup-gnutar: time 0.100: doing level 1 dump as listed-incremental from 
/var/lib/amanda/gnutar-lists/watertown.ott.
precidia.com_home_watertown_0 to 
/var/lib/amanda/gnutar-lists/watertown.ott.precidia.com_home_watertown_1.new
sendbackup-gnutar: time 0.114: doing level 1 dump from date: 2003-04-01  
1:25:14 GMT
sendbackup: time 0.199: spawning /usr/local/lib/amanda/runtar in pipeline
sendbackup: argument list: gtar --create --file - --directory /home/watertown 
--listed-incremental /var/lib/amanda/gnuta
r-lists/watertown.ott.precidia.com_home_watertown_1.new --sparse 
--ignore-failed-read --totals --exclude-from /tmp/amand
a/sendbackup._home_watertown.20030402202345.exclude .
sendbackup: time 0.290: started index creator: "/usr/bin/tar -tf - 2>/dev/null 
| sed -e 's/^\.//'"
sendbackup-gnutar: time 0.296: /usr/local/lib/amanda/runtar: pid 3604
sendbackup: time 0.693:  53:    size(|): Total bytes written: 10240 (10kB, ?B/s)
sendbackup: time 0.826: index created successfully

Why was that message not there?  Did the program exit?  The only thing between
that log and the preceeding "start_backup" call is a call to the function
"parse_backup_messages" and I can't see how that can exit without adding a
message in the log file.

                                          Brian
                                 ( bcwhite AT precidia DOT com )

-------------------------------------------------------------------------------
                    No man dies except he who has not lived.

<Prev in Thread] Current Thread [Next in Thread>