Rohit wrote:
No core dump was generated.
Maybe because the default "ulimit" for core files is 0 (do not
generate core files. (try "ulimit -c" in bash)
When a program does something illegal, unix dumps it's memory
into a file, for the programmers to do some postmortem analysis
on the problem. The core file is generated in the current
working directory. For amanda that could be: the directory
where you started the amflush, or /tmp/amanda, or ~amanda/TheConfig
directory (where the amflush.1 file is). I'm not sure which
one of the three it is.
Try: strace -p pid-of-the-process
"strace" (as root or as amanda), shows you the live system calls a
program calling. A strong indication of what is is doing (sleep?
reading disk? waiting for input on the tty?).
(for amflush and driver)
Which one is taking CPU? I guess amflush.
amflush
So I would like to have a core file of amflush. We can force
a crash and core dump of a program by sending it signal number 3.
First we have to enable the maximum size of a core dump (in this
session only, and all the children started by this session).
We do this by using the bash builtin command below.
Try to examine a core file to find out what it was doing:
$ ulimit -c unlimited
Then we start amflush as usual.
$ amflush ....
Your description of the problem indicates it is now sitting there,
doing nothing useful, but taking cpu.
So now we force a core dump:
and in another window, as amanda or root:
$ kill -3 pid-of-amflush
and then, cd to where the core file is found (current directory,
or /tmp/amanda or ~amanda/TheConfig, I'm not sure), and
get a stacktrace:
If we have found the core dump (a file with the name "core" in one of
those three directories), you can verify if it is indeed a core file
of the right program:
$ file core
core: ELF 32-bit LSB core file of 'amflush' (signal 3), Intel 80386...
And then we take the debugger, and have a look in which function
the program was, when the hammer hit it.
$ gdb /usr/sbin/amflush core
gdb> bt
I'm not very clear on what you are asking me to do. Can you please
elaborate?
Here is a typescript:
$ ulimit -c unlimited
$ ls -l core
ls: core: No such file or directory
$ sleep 60 &
[1] 29644
# We have 60 seconds time to kill that process, the '&' puts it
# in the background and bash tells us the pid too
# For amflush you'll have to open a different window, and find
# the pid by "ps -ef".
$ ps -fp 29644
UID PID PPID C STIME TTY TIME CMD
paul 29644 29201 0 16:46 pts/13 00:00:00 sleep 60
$ kill -3 29644
# hit enter here one more time to synchronise the msg:
[1]+ Quit (core dumped) sleep 60
$ ls -l core
-rw------- 1 paul nuts 90112 Feb 5 16:46 core
$ file core
core: ELF 32-bit LSB core file of 'sleep' (signal 3), Intel 80386...
$ gdb /usr/bin/sleep core
GNU gdb 5.2
Copyright 2002 Free Software Foundation, Inc.
...
Core was generated by `sleep 60'.
Program terminated with signal 3, Quit.
...
#0 0x400c75d1 in __libc_nanosleep () at __libc_nanosleep:-1
-1 __libc_nanosleep: No such file or directory.
in __libc_nanosleep
(gdb)
# It was in the function nanosleep, and get a backtrace:
(gdb) bt
#0 0x400c75d1 in __libc_nanosleep () at __libc_nanosleep:-1
#1 0x400c7568 in __sleep (seconds=60) at [...]/linux/sleep.c:85
#2 0x08048972 in error () at error.c:227
#3 0x4003d17d in __libc_start_main (main=0x8048868 <error+412>,
argc=2, ubp_av=0xbffff694,
init=0x8048584, fini=0x8048bfc <error+1328>, rtld_fini=0x4000a534
<_dl_fini>,
stack_end=0xbffff68c) at ../sysdeps/generic/libc-start.c:129
(gdb) quit
And if we had the source, we have maybe a clue what was going on
(satisfaction *not* garanteed, but for open source software, you do
have access to the source!).
--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens AT xplanation DOT com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|