I have been looking at the enormous amount of debug files, and what I
can see is that always there are "fail" lines in the dumper.*.debug
files of the type:
1207905536.305211: dumper: connect_portrange: connect to
127.0.0.2.10080 failed: Cannot assign requested address
"
However, that seems to be normal, because even in normal cases, these
lines are present.
In the case there are these "cannot read header" problems, the
chunker.*.debug file report an error. For example, today the same
DLE /home/charles/pictures failed in the chuncker:
"
1207647127.918807: chunker: pid 8024 ruid 1002 euid 1002: start at Tue Apr 8
11:32:07 2008
1207647127.919828: chunker: pid 8024 ruid 1002 euid 1002: rename at Tue Apr 8
11:32:07 2008
1207647127.919889: chunker: getcmd: START 20080408110501
1207647127.919912: chunker: getcmd: PORT-WRITE 03-00013
/work-1/amanda/dumps/20080408110501/fiume.localnet._home_charles_pictures.1
fiume.localnet ffff
ffff9ffeffffffff00 /home/charles/pictures 1 2008:4:4:9:7:56 1048576 GNUTAR 928
|;auth=bsdtcp;index;
1207647127.920075: chunker: try_socksize: receive buffer size is 65536
1207647127.924152: chunker: bind_portrange2: Try port 11028: Available -
Success
1207647127.924186: chunker: stream_server: waiting for connection: 0.0.0.0.11028
1207647127.924193: chunker: putresult: 23 PORT
1207647127.928237: chunker: stream_accept: connection from 127.0.0.1.11002
1207647127.928247: chunker: try_socksize: receive buffer size is 65536
1207647137.985112: chunker: putresult: 10 FAILED
1207647137.985202: chunker: pid 8024 finish time Tue Apr 8 11:32:17 2008
"
I think the reason for the "cannot read header" message is because the
file left in the dumps disk is zero length:
"
ll /work-1/amanda/dumps/200804*/*
-rw------- 1 amandabackup disk 0 Apr 6 11:22
/work-1/amanda/dumps/20080406110501/fiume.localnet._home_charles_pictures.1
-rw------- 1 amandabackup disk 0 Apr 8 11:32
/work-1/amanda/dumps/20080408110501/fiume.localnet._home_charles_pictures.1
-rw------- 1 amandabackup disk 0 Apr 11
11:18
/work-1/amanda/dumps/20080411110501/fiume.localnet._home_charles_pictures.1
"
If there are problems, it is always this /home/charles/pictures DLE which
fails and sometimes also another, but often the other recovers, as today:
"
fiume.localnet /usr lev 1 FAILED [cannot read header: got 0 bytes
instead of 32768]fiume.localnet /usr lev 1 was successfully retried
"
Thanks for looking.
Charles
On Thu, 10 Apr 2008 10:18:10 -0400
"Dustin J. Mitchell" <dustin AT zmanda DOT com> wrote:
> On Tue, Apr 8, 2008 at 10:38 AM, Charles Stroom
> <charles AT stremen.xs4all DOT nl> wrote:
> > FAILURE DUMP SUMMARY:
> > fiume.localnet /home/charles/pictures lev 1 FAILED [cannot read
> > header: got 0 bytes instead of 32768]
> > fiume.localnet /home/charles/pictures lev 1 FAILED [too many
> > dumper retry: "[request failed: Resource temporarily unavailable]"]
> > fiume.localnet /home/charles/pictures lev 1 FAILED [cannot read
> > header: got 0 bytes instead of 32768]
>
> My apologies -- now that the 2.6.0 release has been out, I've all been
> excitedly hacking on new work for 2.6.1. A side-effect is that I've
> just been "skimming" amanda-users, and I missed this email.
>
> There could be a number of problems underlying this, unfortunately.
> Basically, the header didn't arrive on the server, but there's no
> indication of what caused that. The next step is to look around in
> the debug logs on both the server and the client for anything unusual
> about those runs (unusual = different from the runs that succeeded).
> On the server, have a look at dumper.*. On the client, have a look at
> amandad.* and sendbackup*.
>
> "Resource temporarily unavailable" corresponds to EAGAIN, and is
> usually seen with non-blocking IO, which Amanda doesn't use, so I'm
> not sure what to make of that information. The dumper debug log may
> clarify that.
>
> Dustin
>
--
Charles Stroom
email: charles at no-spam.stremen.xs4all.nl (remove the "no-spam.")
|