Amanda-Users

Re: Service looping/terminating...

2003-09-04 14:01:40
Subject: Re: Service looping/terminating...
From: Eric Siegerman <erics AT telepres DOT com>
To: "'amanda-users AT amanda DOT org'" <amanda-users AT amanda DOT org>
Date: Thu, 4 Sep 2003 13:55:21 -0400
On Thu, Sep 04, 2003 at 10:50:52AM -0400, Ripley, Scott wrote:
> I have ten Solaris8 clients running Amanda just fine, however one of them
> has been holding out on me for months and I can't figure it out.
> [...]
> It compiles fine, it installs fine, and it even seems to run fine (If I do
> /usr/local/libexec/amandad, it runs for a minute, then stops, like all the
> others do).
> [...]
> Sep  4 08:24:27 local@host inetd[137]: [ID 858011 daemon.warning] 
> /usr/local/libexec/amandad: Killed
> Sep  4 08:24:27 local@host inetd[137]: [ID 667328 daemon.error] amanda/udp 
> server failing (looping), service terminated

Other people have explained the "failing (looping)".  The
"Killed" is presumably a clue to the root cause, i.e. to the
failure that inetd then detects and complains about.

So, is something really sending amandad SIGKILL?  Hard to imagine
what, unless it's the ld.so machinery.  I seem to recall
processes getting killed when a shared lib can't be loaded.  You
say it's fine from the command line.  But inetd isn't the command
line -- no LD_LIBRARY_PATH for one thing, and other differences
too.  Try changing the command in the inetd.conf entry to refer
to a little shell script (or C program, if inetd doesn't like
scripts) that does:
        ldd /usr/local/libexec/amandad >somewhere

Then run amcheck and see if anything interesting turns up in
"somewhere".


I'm also curious (don't know why, but they're a couple more data
points):
  - Is there only one "Killed" message, or are there a bunch
    followed by a single "failing (looping)"?  (I suspect the
    latter)

  - How long does it take for the "Killed" message(s) to appear?
    Is it as soon as you run amcheck, or after some delay?  (I
    can see that there's no delay between "Killed" and "failing
    (looping)", but that's not particularly interesting; it just
    says inetd is on the ball...)

  - Have you diff'ed the .so's that amandad needs, against the
    copies on a system where Amanda works?


Failing all of that, another long shot:  are you building Amanda
on the host in question, or do you build it once and then install
it on all 11 machines?  Whichever way you've been doing it ...
try the other way.  I'm not recommending one approach over the
other; either one could in theory lead to weirdness:
  - If you've been building it locally, perhaps something in that
    machine's compiler toolchain has gone bad

  - If you've been installing from a remote build, perhaps
    there's something on the build system that's not quite right
    for the target in question


Oh, yeah, readline.  (That implies a local build, since it is in
fact a difference :-)  Try building without it.  I don't know why
it would be a problem, but it's one more variable to try
eliminating.

--

|  | /\
|-_|/  >   Eric Siegerman, Toronto, Ont.        erics AT telepres DOT com
|  |  /
When I came back around from the dark side, there in front of me would
be the landing area where the crew was, and the Earth, all in the view
of my window. I couldn't help but think that there in front of me was
all of humanity, except me.
        - Michael Collins, Apollo 11 Command Module Pilot