>----- Original Message -----
>From: "Eric Siegerman" <erics AT telepres DOT com>
> On Thu, Nov 04, 2004 at 09:57:52AM +0100, Paul Bijnens wrote:
> > Flynn wrote:
> >
> > >Amdump sometimes goes crazy apparently eating up all the machine
resources
> > >and I can't get any access to anything when this happens, because I
think
> > >it's lost managing memory swap pages or something.
>
> Yes, your symptoms do sound like severe page thrashing. I've
> never seen that with Amanda either, but there's always a first
> time :-/
>
> > the loadavg sometimes goes to 12-15 (the 15-minute one!), but I
> > still can connect to it, run amstatus etc.
>
> That's CPU contention. Stuff slows down, but a lot more
> gracefully than when it's RAM that's the problem, as it seems to
> be in this case.
>
> > Try to gather some data while you use it (e.g. with crontab) about
> > load and memory use, long list of processes, etc. and hopefully
> > you see something just before the machines locks up.
>
> One easy way is to just run "vmstat 30" all night -- use the
> "script" command to capture the output. That'll show you whether
> it is indeed a paging problem. Note that vmstat doesn't print
> the time, so it can sometimes be useful to run a script like
> this:
> while [ 1 ]; do
> date
> sleep 300
> done
> in the background in the *same* window as vmstat. The output
> will be a bit jumbled, but at least it'll be timestamped.
>
> Then (or at the same time, in another window) you can run a loop
> like the above but with a "ps -le" in it, again capturing the
> output with "script".
>
> (I prefer "script" to simple output redirection for stuff like
> this, because I can both watch the commands as they run and
> capture their output at the same time.)
>
> Of course, if your system has the "sar" stuff installed, you can
> use that, but there's a bit more learning and setup involved.
> What I described above is the quick and dirty approach.
>
Hello again - and thank you for your concern.
Here is what I got from last night's run - it didn't fail, but there is
somehow a clue right there :
09:16:04 up 23:18, 1 user, load average: 4.34, 4.45, 4.29
59 processes: 57 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 0.9% user 29.8% system 0.0% nice 0.0% iowait 69.1%
idle
Mem: 513792k av, 507424k used, 6368k free, 0k shrd, 1120k
buff
487696k actv, 1076k in_d, 8k in_c
Swap: 530136k av, 530136k used, 0k free 240k
cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU
COMMAND
2454 amanda 15 0 985M 475M 52 D 0.5 94.6 33:18 0 dumper
2441 amanda 15 0 120 4 0 S 0.0 0.0 0:00 0 amdump
2451 amanda 15 0 200 4 0 S 0.0 0.0 0:01 0 driver
2452 amanda 15 0 148 0 0 SW 0.0 0.0 28:36 0 taper
2455 amanda 15 0 244 4 0 S 0.0 0.0 0:00 0 dumper
2457 amanda 15 0 244 4 0 S 0.0 0.0 0:01 0 dumper
2458 amanda 25 0 144 4 0 S 0.0 0.0 0:00 0 dumper
3238 amanda 23 0 124 4 0 S 0.0 0.0 1:27 0
sendbackup
3240 amanda 15 0 340 4 0 S 0.0 0.0 0:00 0 gzip
3242 amanda 25 0 128 4 0 S 0.0 0.0 0:00 0 sh
3243 amanda 15 0 64 4 0 S 0.0 0.0 0:27 0 tar
3244 amanda 15 0 1572 4 0 S 0.0 0.0 2:43 0
smbclient
3245 amanda 15 0 80 4 0 S 0.0 0.0 0:02 0 sed
It seems like "dumper" (2454) eats up all the memory I have, including the
swap area up to 1Gb.
Once there, either it fails with a "not enough memory", or something goes
wrong and the
system locks itself up probably due to another bug...
So the question is : why does "dumper" eat so much memory ?
Rgds,
Jean Flinois <tech AT vtech DOT fr>
V-Technologies, Savennières
|