Amanda-Users

Re: RE Unravel amstatus output

2006-07-17 07:41:30
Subject: Re: RE Unravel amstatus output
From: "Joe Donner (sent by Nabble.com)" <lists AT nabble DOT com>
To: amanda-users AT amanda DOT org
Date: Mon, 17 Jul 2006 04:32:07 -0700 (PDT)
When I execute the top command (Red Hat Enterprise 3) for user Amanda, I get:

 PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 2136 amanda    15   0   948  948   836 S     0.0  0.0   0:00   1 amdump
 2145 amanda    15   0  1072 1072   844 S     0.0  0.1   0:02   1 driver
 2146 amanda    16   0  1536 1536  1388 S     0.0  0.1   0:52   0 taper
 2147 amanda    16   0  1560 1560  1396 D     0.0  0.1   0:34   0 taper
 2148 amanda    22   0  1120 1120   876 S     0.0  0.1  12:55   0 dumper
 2153 amanda    15   0  1120 1120   876 S     0.0  0.1   0:19   0 dumper
 2154 amanda    15   0  1044 1044   816 S     0.0  0.1   0:00   1 dumper
 2155 amanda    25   0   852  852   708 S     0.0  0.0   0:00   0 dumper

and ps -fu amanda outputs:

UID        PID  PPID  C STIME TTY          TIME CMD
amanda    2136  2135  0 Jul14 ?        00:00:00 /bin/sh /usr/sbin/amdump
daily
amanda    2145  2136  0 Jul14 ?        00:00:02 /usr/lib/amanda/driver daily
amanda    2146  2145  0 Jul14 ?        00:00:52 taper daily
amanda    2147  2146  0 Jul14 ?        00:00:34 taper daily
amanda    2148  2145  0 Jul14 ?        00:12:55 dumper0 daily
amanda    2153  2145  0 Jul14 ?        00:00:19 dumper1 daily
amanda    2154  2145  0 Jul14 ?        00:00:00 dumper2 daily
amanda    2155  2145  0 Jul14 ?        00:00:00 dumper3 daily

Does this tell anyone anything?


Paul Bijnens wrote:
> 
> On 2006-07-17 11:36, Joe Donner (sent by Nabble.com) wrote:
>> Good point - and that is why I need help unravelling what it all means. 
>> My
>> question now would be:  0.41% of what?  What would 100% of that something
>> represent?  Constant streaming of data to tape from holding disk?
> 
> of the total elapsed time since the program started.
> 
> But there is some caveat.  The amstatus command works by parsing the log
> file.  And the logfile is written to only when there is a change in 
> state in the backup process.  So the 0.41% probably means that since the
> last status message written by taper in the logfile is already long ago.
> It could well be that taper is taping one very large file, but has not
> yet written that into the log file which amstatus parses.
> 
> So, to find out if really anything is still running, do
>    ps -fu amanda
> on the tape server, and verify if there is still a taper process (and
> other processes like driver).
> If they are, then what are they doing ("strace -p" help here).
> 
> You may kill them all, and then clean up the broken pieces by running 
> "amcleanup".
> 
> 
> 
>> 
>> I've just left it alone to see if I get different results when
>> subsequently
>> running amstatus, but it seems stuck at wherever it is at the moment. 
>> The
>> tape drive itself is doing nothing...
>> 
>> It really seems as if all went reasonably well and then froze up for some
>> reason.
>> 
>> Please help if at all possible.
>> 
>> 
>> Cyrille Bollu wrote:
>>> Looking with my newbie's eyes it seems that Amanda is running well. Just 
>>> very slowly.
>>>
>>> And Amanda's log seems to indicate that the problem is on the tape drive 
>>> side.
>>>
>>> The only thing strange that I see is the following line which say that 
>>> your drive is busy only 0,41% of the time:
>>>
>>>>    taper busy   :  0:12:38  (  0.41%)
>>> What does it do the rest of the time???
>>>
>>> owner-amanda-users AT amanda DOT org a écrit sur 17/07/2006 10:54:55 :
>>>
>>>> I set up Amanda on Friday to do an almost real backup job.  I thought 
>>> this
>>>> would be the final test before putting it into operation.
>>>>
>>>> When I arrived at work this morning, I was somewhat surprised to see 
>>> that
>>>> the Amanda run doesn't seem to have finished.  amstatus daily gives me 
>>> some
>>>> information, but I'm not sure how to interpret it.
>>>>
>>>> There are still 3 files on the holding disk, adding up to about 48GB. 
>>> The
>>>> tape drive doesn't seem to be doing anything - just sitting there 
>>> quietly at
>>>> the moment with no sign of activity.
>>>>
>>>> I won't include the entire output of amstatus daily, but here are 
>>> extracts,
>>>> if someone can please tell me if they see something wrong.
>>>>
>>>> I have many entries like these - seems to be one for each DLE:
>>>> cerberus:/home                               0  1003801k finished 
>>> (22:18:15)
>>>> Then these entries, which I think are the 2 that failed, as shown later 
>>> in
>>>> the summary:
>>>> cerberus:/.autofsck                          0 planner: [disk
>>>> /.autofsck
>>>> offline on cerberus?]
>>>> cerberus:/.fonts.cache-1                     0 planner: [disk
>>>> /.fonts.cache-1 offline on cerberus?]
>>>>
>>>> Then these 3 that are the ones still on the holding disk:
>>>> minerva:/home                                0  8774296k writing to
>>>> tape
>>>> (23:09:07)
>>>> minerva:/usr/local/clients                   0 32253287k dump done
>>>> (1:08:27), wait for writing to tape
>>>> minerva:/usr/local/development               0  9687648k dump done
>>>> (23:48:17), wait for writing to tape
>>>>
>>>> And then this summary, which I'm not sure how to interpret:
>>>> SUMMARY          part      real  estimated
>>>>                            size       size
>>>> partition       : 109
>>>> estimated       : 107             69631760k
>>>> flush           :   0         0k
>>>> failed          :   2                    0k           (  0.00%)
>>>> wait for dumping:   0                    0k           (  0.00%)
>>>> dumping to tape :   0                    0k           (  0.00%)
>>>> dumping         :   0         0k         0k (  0.00%) (  0.00%)
>>>> dumped          : 107  58148656k  69631760k ( 83.51%) ( 83.51%)
>>>> wait for writing:   2  41940935k  48107940k ( 87.18%) ( 60.23%)
>>>> wait to flush   :   0         0k         0k (100.00%) (  0.00%)
>>>> writing to tape :   1   8774296k  12515695k ( 70.11%) ( 12.60%)
>>>> failed to tape  :   0         0k         0k (  0.00%) (  0.00%)
>>>> taped           : 104   7433425k   9008125k ( 82.52%) ( 10.68%)
>>>> 4 dumpers idle  : not-idle
>>>> taper writing, tapeq: 2
>>>> network free kps:      2000
>>>> holding space   :  50295358k ( 49.79%)
>>>>  dumper0 busy   :  2:53:47  (  5.67%)
>>>>  dumper1 busy   :  0:13:48  (  0.45%)
>>>>  dumper2 busy   :  0:00:00  (  0.00%)
>>>>    taper busy   :  0:12:38  (  0.41%)
>>>>  0 dumpers busy : 2+0:07:56  ( 94.22%)            not-idle: 2+0:00:04 
>>>> (
>>>> 99.73%)
>>>>                                                 start-wait:  0:07:51  ( 
>>>> 0.27%)
>>>>  1 dumper busy  :  2:46:29  (  5.43%)            not-idle:  1:20:10  (
>>>> 48.15%)
>>>>                                        client-constrained:  1:18:08  (
>>>> 46.93%)
>>>>                                              no-bandwidth:  0:04:16  ( 
>>>> 2.57%)
>>>>                                                start-wait:  0:03:54  ( 
>>>> 2.35%)
>>>>  2 dumpers busy :  0:10:34  (  0.35%)  client-constrained:  0:06:22  (
>>>> 60.27%)
>>>>                                                start-wait:  0:04:05  (
>>>> 38.76%)
>>>>                                              no-bandwidth:  0:00:06  ( 
>>>> 0.96%)
>>>>  3 dumpers busy :  0:00:00  (  0.00%)
>>>>
>>>> I would highly appreciate your insight into what is going on,
>>>> especially 
>>> for
>>>> the 3 DLEs that are "waiting for writing to tape".
>>>> -- 
>>>> View this message in context: http://www.nabble.com/Unravel-
>>>> amstatus-output-tf1953587.html#a5357597
>>>> Sent from the Amanda - Users forum at Nabble.com.
>>>>
>>>
>> 
> 
> 
> 
> -- 
> Paul Bijnens, xplanation Technology Services        Tel  +32 16 397.511
> Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
> http://www.xplanation.com/          email:  Paul.Bijnens AT xplanation DOT com
> ***********************************************************************
> * I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
> * F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
> * stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
> * PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
> * init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
> * ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
> ***********************************************************************
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Unravel-amstatus-output-tf1953587.html#a5359300
Sent from the Amanda - Users forum at Nabble.com.



<Prev in Thread] Current Thread [Next in Thread>