Jean-Louis,
Thank you for replying.
OK ... but I'm beginning to think there are network issues.
Last night this machine failed during the auth phase. First
time that's happened. 'amcheck -c' later succeeded.
Would the amandad.*.debug REQ packet received be an entry
like this?
<<<<<
SERVICE noop
OPTIONS features=ffffffff9ffeffffffff00;
Increasing rep_tries? I'm grasping for a solution, and
presuming that (from the context of REP in the *debug logs)
that REP indicates an attempt to reply to the server.
More tries = greater chance of success? Evidently not.
There are spare NICs on both server and client; I'm going
to try a backup over a cross-over cable (e.g. take the
existing cables / ports / switches out of the picture ...).
Bryan
On Fri, Jul 24, 2009 at 07:10:53AM -0400, Jean-Louis Martineau wrote:
> You should look at the REQ packet in the amandad.*.debug file to find
> what is bogus in it.
> I don't understand why you want to increase rep_tries?
>
> Jean-Louis
>
> Bryan wrote:
> >Looking for some guidance on a perplexing problem, plus a summary
> >update on my earlier note (below):
> >
> >Answers to my earlier questions were 'all works with 2.5.1p3'.
> >However, 2.5.1 is problematic for a client with ~750 ZFS file
> >systems; on test and production runs, variously 10% to 50% of
> >the client's ZFS file systems failed to back up due to missing
> >ACKs on sendsize.
> >
> >Building 2.6.x on a SUNWCreq installation on Solaris 9/10 with
> >SUNWspro 12 and/or SUNWgccfss was NOT successful (after current
> >glib / pkg-config were installed). It builds on a SUNWCprog
> >cluster installation. Will investigate this further later.
> >
> >Current problem: Moving from 2.5.1 to 2.5.2 ...
> >
> >Solaris 9 server, Solaris 10 client, both running 2.5.2p1, both
> >built (no problems noted) with SUNWspro 12. Tape loads OK, bsd
> >auth, client selfcheck succeeds, client sendsize fails,
> >apparently due to:
> >
> > sendsize: debug 1 pid 15227 ruid 50 euid 50: start at Thu Jul 23
> > 12:01:15 2009
> > sendsize: version 2.5.2p1
> > Reading conf file "/etc/amanda/amanda-client.conf".
> > Could not open conf file "/etc/amanda/DAILY/amanda-client.conf": No such
> > file or directory
> > sendsize: debug 1 pid 15227 ruid 50 euid 50: rename at Thu Jul 23
> > 12:01:15 2009
> >* sendsize[15227]: time 0.541: REQ packet is bogus: no dumpdate
> > sendsize: time 0.541: pid 15227 finish time Thu Jul 23 12:01:15 2009
> >
> >amandad agrees:
> >
> > <<<<<
> > OPTIONS features=ffffffff9ffeffffffff00;
> > FORMAT ERROR IN REQUEST PACKET
> > >>>>>
> > amandad: udpbsd_sendpkt: enter
> > amandad: time 8.705: bsd: pkthdr2str handle '000-00000001'
> > amandad: time 8.705: sec: udpbsd_sendpkt: PREP (2) pkt_t (len 72)
> > contains:
> >
> > "OPTIONS features=ffffffff9ffeffffffff00;
> > FORMAT ERROR IN REQUEST PACKET
> > "
> >
> >sendsize functioned (imperfectly) on 2.5.1. The effort to move
> >from 2.5.1 to 2.5.2 is an attempt to gain 'rep_tries' in
> >amanda-client.conf. The client conf file says:
> >
> > connect_tries 10
> > rep_tries 50
> > debug_amandad 1
> > debug_amidxtaped 1
> > debug_amindexd 1
> > debug_amrecover 1
> > debug_auth 1
> > debug_event 1
> > debug_holding 1
> > debug_protocol 1
> > debug_selfcheck 1
> > debug_sendsize 1
> > debug_sendbackup 1
> >
> >disklist entries for the client all use spindle -1. The client
> >has its own dumptype entry that says 'maxdumps 6', and otherwise
> >inherits the usual gtar parameters.
> >
> >The problem is consistently re-produceable, before and after
> >'amadmin delete' of the Solaris 10 client.
> >
> >Hoping that someone can provide some guidance or suggestions.
> >
> >Bryan
> >
> >On Tue, Jul 14, 2009 at 10:49:36AM -0400, Bryan wrote:
> >
> >>Folks,
> >>
> >>We've been running 2.4.x (presently 2.4.4p1) on Solaris 9 since
> >>2002 (and earlier versions back to about 1997 ..). The 2.4.4
> >>release has been enormously successful, and has saved our tail
> >>feathers any number of times. Clients are Solaris and Linux of
> >>various vintages (including some Fedora that should be retired.)
> >>
> >>We're bringing up ZFS file systems; looks like it's time to move
> >>to 2.6.1. I've read the UPGRADING file in the distribution, and
> >>am seeking guidance on some finer points.
> >>
> >>We keep daily backups for 4 weeks, weeklies for 4 months, and
> >>monthlies for 1 year. Can I expect that 2.6.1 amanda will be
> >>able to 'amadmin find' and recover tapes written by 2.4.4? (My
> >>guess is 'yes'.) Is there compelling merit to doing an upgrade to
> >>2.5.1 as a tranisitional step? (Guess = no).
> >>
> >>Updating all of the clients in one fell swoop would present
> >>challenges.
> >>
> >>A number of client machines (but not all) are presently running
> >>2.5.1, the result of a prior but incomplete attempt to upgrade.
> >>The 2.5.1 clients work fine with the 2.4.4 server. Can I expect
> >>the older clients to work with a 2.6.1 server? (Guess = I hope
> >>so.) Or should I upgrade the clients first and the server
> >>second? (Guess = no.)
> >>
> >>My guesses are just that, and have no basis. Some guidance (even
> >>if only more informed guesses) would be appreciated.
> >>
> >>Thank you.
> >>
> >>Bryan
> >>
|