Amanda-Users

Re: "got partial result" and "seek failed: Illegal seek"

2007-04-15 01:13:46
Subject: Re: "got partial result" and "seek failed: Illegal seek"
From: Charles Sprickman <spork AT bway DOT net>
To: Jean-Louis Martineau <martineau AT zmanda DOT com>
Date: Sun, 15 Apr 2007 00:56:24 -0400 (EDT)
On Wed, 11 Apr 2007, Jean-Louis Martineau wrote:

When it say "was successfully retried", that's because the dump succeeded on the second try.

Yep.

It still give the error of the first try "FAILED [cannot read header: got 0 instead of 32768]"

You should be able to find more about it in the amdump.1 log file and the dumper.*.debug and chunker.*.debug on the server and amandad.*.debug and sendbackup.*.debug in the client.

Relevant logs from the server side:

dumper.20070413031001001.debug:

This looks like the successful dump of /spool on h10.

dumper: connected to 209.123.46.110.10080
dumper: our side is 0.0.0.0.585
dumper: try_socksize: send buffer size is zu
dumper: try_socksize: receive buffer size is zu
security_streaminit(stream=0x806b000, driver=0x480c1380 (BSDTCP))
security_streaminit(stream=0x807b000, driver=0x480c1380 (BSDTCP))
security_streaminit(stream=0x8084000, driver=0x480c1380 (BSDTCP))
security_close(handle=0x805d100, driver=0x480c1380 (BSDTCP))
security_stream_close(0x8062000)
dumper: Building type 4 (FILE) header of size 32768 using:
dumper: Contents of *(dumpfile_t *)0x8050200:
dumper:     type             = 4 (FILE)
dumper:     datestamp        = '20070413031001'
dumper:     dumplevel        = 5
dumper:     compressed       = 1
dumper:     encrypted        = 0
dumper:     comp_suffix      = '.gz'
dumper:     encrypt_suffix   = 'N'
dumper:     name             = 'h10.biglist.com'
dumper:     disk             = '/spool'
dumper:     program          = '/sbin/dump'
dumper:     srvcompprog      = ''
dumper:     clntcompprog     = ''
dumper:     srv_encrypt      = ''
dumper:     clnt_encrypt     = ''
dumper:     recover_cmd      = '/usr/bin/gzip -dc |/sbin/restore -f - ...'
dumper:     uncompress_cmd   = ''
dumper:     encrypt_cmd      = ''
dumper:     decrypt_cmd      = ''
dumper:     srv_decrypt_opt  = ''
dumper:     clnt_decrypt_opt = ''
dumper:     cont_filename    = ''
dumper:     is_partial       = 0
dumper:     partnum          = 0
dumper:     totalparts       = 0
dumper:     blocksize        = 32768
security_stream_seterr(0x8084000, EOF)
security_stream_close(0x8084000)
security_stream_seterr(0x807b000, EOF)
security_stream_close(0x807b000)
security_stream_seterr(0x806b000, EOF)
security_stream_close(0x806b000)

I poked around a bit more in that file and found this snippet:

security_getdriver(name=bsdtcp) returns 0x480c1380
security_handleinit(handle=0x805d100, driver=0x480c1380 (BSDTCP))
security_streaminit(stream=0x8060000, driver=0x480c1380 (BSDTCP))
dumper: connect_port: Skip port 512: Owned by exec.
dumper: connect_port: Skip port 513: Owned by login.
dumper: connect_port: Skip port 514: Owned by shell.
[...]
dumper: connect_port: Try  port 585: Available   -
-->>dumper: connect_portrange: connect from 0.0.0.0.585 failed: Operation timed out -->>dumper: connect_portrange: connect to 209.123.46.110.10080 failed: Operation timed out
dumper: stream_client: Could not bind to port in range 512-1023.
dumper: connect_port: Try  port 1026: Available   -
dumper: connected to 209.123.46.110.10080
dumper: our side is 0.0.0.0.1026
dumper: try_socksize: send buffer size is zu
dumper: try_socksize: receive buffer size is zu
-->>security_seterror(handle=0x805d100, driver=0x480c1380 (BSDTCP) error=did not get a reserved port: 1026)
security_close(handle=0x805d100, driver=0x480c1380 (BSDTCP))
security_stream_close(0x8060000)

I'm guessing that's the problem, although nothing in the log necessarily confirms that's h10:/spool.

So what is that telling me? It looks like it steps through /etc/services and finds most stuff used and then uses an unpriviliged port even though it can't use an unprivileged port.

Why does just this one host have that problem?

On the client, amandad has this:

security_handleinit(handle=0x8052000, driver=0x28094900 (BSDTCP))
security_streaminit(stream=0x8058000, driver=0x28094900 (BSDTCP))
security_seterror(handle=0x8052000, driver=0x28094900 (BSDTCP) error=host h-74-0-xx-xx.phlapafg.covad.net: port 1026 not secure) amandad: time 0.012: accept error: host h-74-0-xx-xx.phlapafg.covad.net: port 1026 not secure
amandad: time 0.013: sending NAK pkt:
<<<<<
ERROR host h-74-0-xx-xx.phlapafg.covad.net: port 1026 not secure

security_close(handle=0x8052000, driver=0x28094900 (BSDTCP))
security_stream_close(0x8058000)

No errors in the sendbackup debugs.

So that's it?  Some problem with the range of ports being used.

What are the rules here with what ports can and can't be used?

Why would this impact just one host?

Thanks,

Charles

Jean-louis

Charles Sprickman wrote:
On Tue, 10 Apr 2007, Jean-Louis Martineau wrote:

The "seek failed: Illegal seek" are normal error.
The "got partial result" lines are expected.

You have no problem with the estimate phase, don't try to search for bug there.

Just for my own curiosity, what are those errors telling me? The partial result error does show up for every host.

I'm down to one host that has recurring errors. The dumps eventually complete, but I'd like to track down what's happening here. Looking through the older amreport output, I see one host consistently getting this error and then retrying:

h10.biglist.com / lev 3 FAILED [cannot read header: got 0 instead of 32768]
  h10.biglist.com     /                lev 3  the dump "


h10.biglist.com /spool lev 4 FAILED [cannot read header: got 0 instead of 32768]
  h10.biglist.com     /spool         lev 4  was successfully retried


h10.biglist.com / lev 2 FAILED [cannot read header:
got 0 instead of 32768]
  h10.biglist.com     /                   lev 2  was successfully retried

The only errors of note I could find in either the server or client logs were what I had posted below. This host is in the same location and running the same OS version as other hosts that dump without errors.

What should I be looking for?

I'm not seeing any errors in the sendbackup debug logs.

Thanks,

Charles

Jean-Louis


Charles Sprickman wrote:
Perhaps the root of the problems I'm having, perhaps not.

I keep saying "I'm done with this" and then I go back and correlate more server/client logs.

This is my latest finding.

While backups seem to be mostly working (even though I'm having some amrecover issues in another thread), I have not seen anyone comment on the insane amount of errors in amdump about "partial result":

[devel2]/var/log/amanda # grep "partial result" amdump.1 | wc -l
    2468

Lines look basically like this, both for tar AND dump, and on EVERY client:

planner: time 20.742: got partial result for host h16.biglist.com disk /usr/local: 0 -> 130180K, 1 -> 68K, -1 -> -2K planner: time 20.742: got partial result for host h16.biglist.com disk /home: 0 -> 65074K, 2 -> 44K, -1 -> -2K planner: time 20.742: got partial result for host h16.biglist.com disk /: 0 -> 45968K, 1 -> 66K, -1 -> -2K planner: time 20.815: got partial result for host h20.biglist.com disk /var/qmail/bin: 0 -> -2K, 1 -> -2K, -1 -> -2K

And then on a client when it's sending the size estimate:

root@h07[/tmp/amanda/client/DailySet1]# grep "seek failed" sendsize.20070409* | wc -l
      13

And both for gtar and dump types:

dump_calc_estimates: warning - seek failed: Illegal seek
gnutar_calc_estimates: warning - seek failed: Illegal seek

And the errors in context:

(dump)
sendsize: debug 1 pid 56571 ruid 2 euid 2: rename at Mon Apr 9 03:10:02 2007
sendsize[56571]: time 0.014: waiting for any estimate child: 1 running
sendsize[56573]: time 0.014: calculating for amname /, dirname /, spindle -1
sendsize[56573]: time 0.015: getting size via dump for / level 0
sendsize[56573]: time 0.015: calculating for device /dev/da0s1a with ufs
sendsize[56573]: time 0.016: running "/sbin/dump 0Shsf 0 1048576 - /dev/da0s1a"
sendsize[56573]: time 0.017: running /usr/local/libexec/amanda/killpgrp
sendsize[56573]: time 0.022: DUMP: Date of this level 0 dump: Mon Apr 9 03:10:02 2007
sendsize[56573]: time 0.023:   DUMP: Date of last level 0 dump: the epoch
sendsize[56573]: time 0.024: DUMP: Dumping /dev/da0s1a (/) to standard output
sendsize[56573]: time 0.077:   DUMP: mapping (Pass I) [regular files]
sendsize[56573]: time 0.648:   DUMP: mapping (Pass II) [directories]
sendsize[56573]: time 0.649:   DUMP: estimated 43991 tape blocks.
sendsize[56573]: time 0.650: .....
sendsize[56573]: estimate time for / level 0: 0.633
sendsize[56573]: estimate size for / level 0: 43991 KB
sendsize[56573]: time 0.650: asking killpgrp to terminate
dump_calc_estimates: warning - seek failed: Illegal seek
--> Do we start over here??
sendsize[56573]: time 1.660: getting size via dump for / level 1
sendsize[56573]: time 1.660: calculating for device /dev/da0s1a with ufs
sendsize[56573]: time 1.660: running "/sbin/dump 1Shsf 0 1048576 - /dev/da0s1a"
sendsize[56573]: time 1.661: running /usr/local/libexec/amanda/killpgrp
sendsize[56573]: time 1.667: DUMP: Date of this level 1 dump: Mon Apr 9 03:10:04 2007 sendsize[56573]: time 1.668: DUMP: Date of last level 0 dump: Sat Apr 7 04:03:16 2007 sendsize[56573]: time 1.668: DUMP: Dumping /dev/da0s1a (/) to standard output
sendsize[56573]: time 1.710:   DUMP: mapping (Pass I) [regular files]
sendsize[56573]: time 2.204:   DUMP: mapping (Pass II) [directories]
sendsize[56573]: time 2.347:   DUMP: estimated 76 tape blocks.
sendsize[56573]: time 2.347: .....
sendsize[56573]: estimate time for / level 1: 0.686
sendsize[56573]: estimate size for / level 1: 76 KB
sendsize[56573]: time 2.347: asking killpgrp to terminate
dump_calc_estimates: warning - seek failed: Illegal seek
sendsize[56573]: time 3.350: done with amname / dirname / spindle -1
sendsize[56571]: time 3.351: child 56573 terminated normally
sendsize[56571]: time 3.352: waiting for any estimate child: 1 running

(gtar)

sendsize[56571]: time 13.583: waiting for any estimate child: 1 running
sendsize[56623]: time 13.583: calculating for amname /var/db/pkg, dirname /var/db/pkg, spindle -1 sendsize[56623]: time 13.583: getting size via gnutar for /var/db/pkg level 0 sendsize[56623]: time 13.585: spawning /usr/local/libexec/amanda/runtar in pipeline sendsize[56623]: argument list: runtar DailySet1 /usr/local/bin/gtar --create -- file /dev/null --directory /var/db/pkg --one-file-system --listed-incremental /usr/local/var/amanda/gnutar-lists/h07.biglist.com_var_db_pkg_0.new --sparse --ignore-failed-read --totals . sendsize[56623]: time 13.675: Total bytes written: 2816000 (2.7MiB, 117MiB/s)
sendsize[56623]: time 13.676: .....
sendsize[56623]: estimate time for /var/db/pkg level 0: 0.091
sendsize[56623]: estimate size for /var/db/pkg level 0: 2750 KB
sendsize[56623]: time 13.676: waiting for runtar "/var/db/pkg" child
sendsize[56623]: time 13.677: after runtar /var/db/pkg wait
gnutar_calc_estimates: warning - seek failed: Illegal seek
--> Again, looks like we repeat
sendsize[56623]: time 13.678: getting size via gnutar for /var/db/pkg level 1 sendsize[56623]: time 13.679: spawning /usr/local/libexec/amanda/runtar in pipeline sendsize[56623]: argument list: runtar DailySet1 /usr/local/bin/gtar --create --file /dev/null --directory /var/db/pkg --one-file-system --listed-incremental /usr/local/var/amanda/gnutar-lists/h07.biglist.com_var_db_pkg_1.new --sparse --ignore-failed-read --totals . sendsize[56623]: time 13.759: Total bytes written: 71680 (70KiB, 5.3MiB/s)
sendsize[56623]: time 13.760: .....
sendsize[56623]: estimate time for /var/db/pkg level 1: 0.080
sendsize[56623]: estimate size for /var/db/pkg level 1: 70 KB
sendsize[56623]: time 13.760: waiting for runtar "/var/db/pkg" child
sendsize[56623]: time 13.761: after runtar /var/db/pkg wait
gnutar_calc_estimates: warning - seek failed: Illegal seek
sendsize[56623]: time 13.761: getting size via gnutar for /var/db/pkg level 2 sendsize[56623]: time 13.763: spawning /usr/local/libexec/amanda/runtar in pipeline sendsize[56623]: argument list: runtar DailySet1 /usr/local/bin/gtar --create --file /dev/null --directory /var/db/pkg --one-file-system --listed-incremental /usr/local/var/amanda/gnutar-lists/h07.biglist.com_var_db_pkg_2.new --sparse --ignore-failed-read --totals . sendsize[56623]: time 13.843: Total bytes written: 71680 (70KiB, 5.3MiB/s)
sendsize[56623]: time 13.844: .....
sendsize[56623]: estimate time for /var/db/pkg level 2: 0.081
sendsize[56623]: estimate size for /var/db/pkg level 2: 70 KB
sendsize[56623]: time 13.844: waiting for runtar "/var/db/pkg" child
sendsize[56623]: time 13.845: after runtar /var/db/pkg wait
gnutar_calc_estimates: warning - seek failed: Illegal seek
sendsize[56623]: time 13.846: done with amname /var/db/pkg dirname /var/db/pkg spindle -1
sendsize[56571]: time 13.846: child 56623 terminated normally
sendsize[56571]: time 13.847: waiting for any estimate child: 1 running

Am I correct in assuming these sendsize problems are causing the "partial result" messages in the amdump log?

If so, I wonder if I'm really getting proper size estimates into amanda...

Any thoughts?

Thanks,

Charles