Re: Will retry behavior again

dobryanskaya AT adelphia DOT net wrote:
...

taper: wrote label `weekly3' date `20050901'
dumper: kill index command
driver: result time 39992.417 from dumper0: FAILED 00-00029 ["data write: Connection 
reset by peer"]
driver: result time 39992.417 from taper: TRY-AGAIN 00-00029 [writing file: No 
space left on device]
driver: error time 39992.429 serial gen mismatch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--------------------------
so, AMANDA advanced to the next tape, the taper request to retry was
actually made, but "serial gen mismatch" has happend.


Driver keeps a table of which dumper is handling what.

When it receives a command it also checks the tabledumper-number-to-current-filesystem-it-is-handling.

The "00029"  is the "generation" number.
Fine.

First it receives a "FAILED" from dumper0, and thus it frees the table
entry, effectively setting generation number to 0, to indicate it's
doing nothing.

Less than a microsecond later, it receives a "TRY-AGAIN" from taper,
which is referring to the same generation number, but which was
freed just before.  So amanda says that it received a command for
which the generation number did not match.

OK, that explains the error message and what it means.

The strange thing above seems the order of the events.
When bumping into EOT, I would expect the sequence:
- First taper bumps into end of tape:
  taper:  TRY-AGAIN 00-00029 [...No space left on device]
- then driver says to port-dumper:
  kill whatever you're doing
  driver: ABORT 00-00029      !!!! This command is missing above!!!
  dumper: kill index command

But the "kill index" comes in first, followed by driver saying
"it failed here", then followed by taper saying "tape is full".
From this sequence, it seems amanda made the correct decision to
not try again what taper instructed, because dumper signalled a
fatal error first.

Why would that happen???  I don't know.

I searched google for the phrase - and did not find anything helpful about this error.Does anybody know what is this error means and how to deal with it?
Also, I'm positive, that we had enough space for holding disk to hold
this particuar FS. Why did it start "directly to tape" (log's the very
first line)?


Shot in the dark: maybe a "holdingdisk no" in the dumptype?
See the output of "amadmin weekly disklist hercules".

Another shot in the dark: what version of amanda is the server?
Older versions had also a notion of "negative chunksize": dumps larger
than the absolute value of chunksize were portdump too.  Maybe you
have a negative chunksize?  The same older versions of amanda could
also port-dump when chunksize was omitted.  This is all from memory
I don't even have an old man page around (except in my archive backups :-)
Amanda 2.4.2 already has no more support for negative chunksizes
(and warns if you use them).

--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************