Amanda-Users

Re: Amanda's dumper going amok

2004-11-05 08:33:19
Subject: Re: Amanda's dumper going amok
From: "Flynn" <tech AT vtech DOT fr>
To: <amanda-users AT amanda DOT org>
Date: Fri, 5 Nov 2004 14:29:36 +0100
>----- Original Message ----- 
>From: "Eric Siegerman" <erics AT telepres DOT com>

> On Thu, Nov 04, 2004 at 09:57:52AM +0100, Paul Bijnens wrote:
> > Flynn wrote:
> >
> > >Amdump sometimes goes crazy apparently eating up all the machine
resources
> > >and I can't get any access to anything when this happens, because I
think
> > >it's lost managing memory swap pages or something.
>
> Yes, your symptoms do sound like severe page thrashing.  I've
> never seen that with Amanda either, but there's always a first
> time :-/
>
> > the loadavg sometimes goes to 12-15 (the 15-minute one!), but I
> > still can connect to it, run amstatus etc.
>
> That's CPU contention.  Stuff slows down, but a lot more
> gracefully than when it's RAM that's the problem, as it seems to
> be in this case.
>
> > Try to gather some data while you use it (e.g. with crontab) about
> > load and memory use, long list of processes, etc. and hopefully
> > you see something just before the machines locks up.
>
> One easy way is to just run "vmstat 30" all night -- use the
> "script" command to capture the output.  That'll show you whether
> it is indeed a paging problem.  Note that vmstat doesn't print
> the time, so it can sometimes be useful to run a script like
> this:
> while [ 1 ]; do
> date
> sleep 300
> done
> in the background in the *same* window as vmstat.  The output
> will be a bit jumbled, but at least it'll be timestamped.
>
> Then (or at the same time, in another window) you can run a loop
> like the above but with a "ps -le" in it, again capturing the
> output with "script".
>
> (I prefer "script" to simple output redirection for stuff like
> this, because I can both watch the commands as they run and
> capture their output at the same time.)
>
> Of course, if your system has the "sar" stuff installed, you can
> use that, but there's a bit more learning and setup involved.
> What I described above is the quick and dirty approach.
>

Hello again - and thank you for your concern.
Here is what I got from last night's run - it didn't fail, but there is
somehow a clue right there :

    09:16:04  up 23:18,  1 user,  load average: 4.34, 4.45, 4.29
    59 processes: 57 sleeping, 2 running, 0 zombie, 0 stopped
    CPU states:   0.9% user  29.8% system   0.0% nice   0.0% iowait  69.1%
idle
    Mem:   513792k av,  507424k used,    6368k free,       0k shrd,    1120k
buff
                        487696k actv,    1076k in_d,       8k in_c
    Swap:  530136k av,  530136k used,       0k free                     240k
cached

      PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
COMMAND
     2454 amanda    15   0  985M 475M    52 D     0.5 94.6  33:18   0 dumper
     2441 amanda    15   0   120    4     0 S     0.0  0.0   0:00   0 amdump
     2451 amanda    15   0   200    4     0 S     0.0  0.0   0:01   0 driver
     2452 amanda    15   0   148    0     0 SW    0.0  0.0  28:36   0 taper
     2455 amanda    15   0   244    4     0 S     0.0  0.0   0:00   0 dumper
     2457 amanda    15   0   244    4     0 S     0.0  0.0   0:01   0 dumper
     2458 amanda    25   0   144    4     0 S     0.0  0.0   0:00   0 dumper
     3238 amanda    23   0   124    4     0 S     0.0  0.0   1:27   0
sendbackup
     3240 amanda    15   0   340    4     0 S     0.0  0.0   0:00   0 gzip
     3242 amanda    25   0   128    4     0 S     0.0  0.0   0:00   0 sh
     3243 amanda    15   0    64    4     0 S     0.0  0.0   0:27   0 tar
     3244 amanda    15   0  1572    4     0 S     0.0  0.0   2:43   0
smbclient
     3245 amanda    15   0    80    4     0 S     0.0  0.0   0:02   0 sed

It seems like "dumper" (2454) eats up all the memory I have, including the
swap area up to 1Gb.
Once there, either it fails with a "not enough memory", or something goes
wrong and the
system locks itself up probably due to another bug...

So the question is : why does "dumper" eat so much memory ?

Rgds,

Jean Flinois <tech AT vtech DOT fr>
V-Technologies, Savennières