Amanda-Users

Re: Problem with amflush

2004-02-05 11:03:17
Subject: Re: Problem with amflush
From: Paul Bijnens <paul.bijnens AT xplanation DOT com>
To: Rohit <rohit AT genetechindia DOT com>
Date: Thu, 05 Feb 2004 16:58:54 +0100
Rohit wrote:

No core dump was generated.

Maybe because the default "ulimit" for core files is 0 (do not
generate core files.  (try "ulimit -c" in bash)

When a program does something illegal, unix dumps it's memory
into a file, for the programmers to do some postmortem analysis
on the problem.  The core file is generated in the current
working directory.  For amanda that could be: the directory
where you started the amflush, or /tmp/amanda, or ~amanda/TheConfig
directory (where the amflush.1 file is).  I'm not sure which
one of the three it is.


Try:  strace -p pid-of-the-process

"strace" (as root or as amanda), shows you the live system calls a
program calling.  A strong indication of what is is doing (sleep?
reading disk? waiting for input on the tty?).


(for amflush and driver)
Which one is taking CPU?  I guess amflush.


amflush


So I would like to have a core file of amflush.  We can force
a crash and core dump of a program by sending it signal number 3.

First we have to enable the maximum size of a core dump (in this
session only, and all the children started by this session).
We do this by using the bash builtin command below.

Try to examine a core file to find out what it was doing:
  $ ulimit -c unlimited

Then we start amflush as usual.

  $ amflush ....

Your description of the problem indicates it is now sitting there,
doing nothing useful, but taking cpu.
So now we force a core dump:

and in another window, as amanda or root:

  $ kill -3 pid-of-amflush

and then, cd to where the core file is found (current directory,
or /tmp/amanda or ~amanda/TheConfig, I'm not sure), and
get a stacktrace:

If we have found the core dump (a file with the name "core" in one of
those three directories), you can verify if it is indeed a core file
of the right program:

 $ file core
 core: ELF 32-bit LSB core file of 'amflush' (signal 3), Intel 80386...


And then we take the debugger, and have a look in which function
the program was, when the hammer hit it.


  $ gdb /usr/sbin/amflush core
  gdb>  bt


I'm not very clear on what you are asking me to do. Can you please
elaborate?


Here is a typescript:
  $ ulimit -c unlimited
  $ ls -l core
  ls: core: No such file or directory
  $ sleep 60 &
  [1] 29644
# We have 60 seconds time to kill that process, the '&' puts it
# in the background and bash tells us the pid too
# For amflush you'll have to open a different window, and find
# the pid by "ps -ef".
  $ ps -fp 29644
  UID        PID  PPID  C STIME TTY          TIME CMD
  paul     29644 29201  0 16:46 pts/13   00:00:00 sleep 60
  $ kill -3 29644
# hit enter here one more time to synchronise the msg:
  [1]+  Quit                    (core dumped) sleep 60
  $ ls -l core
  -rw-------    1 paul    nuts        90112 Feb  5 16:46 core
  $ file core
  core: ELF 32-bit LSB core file of 'sleep' (signal 3), Intel 80386...
  $ gdb /usr/bin/sleep core
  GNU gdb 5.2
  Copyright 2002 Free Software Foundation, Inc.
  ...
  Core was generated by `sleep 60'.
  Program terminated with signal 3, Quit.
  ...
  #0  0x400c75d1 in __libc_nanosleep () at __libc_nanosleep:-1
  -1    __libc_nanosleep: No such file or directory.
          in __libc_nanosleep
  (gdb)

# It was in the function nanosleep, and get a backtrace:

  (gdb) bt
  #0  0x400c75d1 in __libc_nanosleep () at __libc_nanosleep:-1
  #1  0x400c7568 in __sleep (seconds=60) at  [...]/linux/sleep.c:85
  #2  0x08048972 in error () at error.c:227
#3 0x4003d17d in __libc_start_main (main=0x8048868 <error+412>, argc=2, ubp_av=0xbffff694, init=0x8048584, fini=0x8048bfc <error+1328>, rtld_fini=0x4000a534 <_dl_fini>,
    stack_end=0xbffff68c) at ../sysdeps/generic/libc-start.c:129

   (gdb) quit

And if we had the source, we have maybe a clue what was going on
(satisfaction *not* garanteed, but for open source software, you do
have access to the source!).

--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************


<Prev in Thread] Current Thread [Next in Thread>