Amanda-Users

Re: VERY high load on server

2004-01-07 17:44:07
Subject: Re: VERY high load on server
From: Frank Smith <fsmith AT hoovers DOT com>
To: Mike Heller <mike AT dsny DOT com>, amanda-users AT amanda DOT org
Date: Wed, 07 Jan 2004 16:41:54 -0600
--On Wednesday, January 07, 2004 14:09:53 -0800 Mike Heller <mike AT dsny DOT 
com> wrote:

> Paul, reply inline
> 
> Paul Bijnens wrote:
> 
>> Mike Heller wrote:
>> 
>>> I have amanda running on serveral servers and last night I tried to 
>>> back up one more to the tape server.  When I arrived this morning, 
>>> the backups were still running and the new server had an extremely 
>>> high load 
>> 
>> 
>> Am I correct that this is the first time that a backup is tried on
>> that host?
> 
> That is correct, it was the first time that this client has been backed up.
> 
>> 
>>> on it.  It's a RedHat Linux 9.0 server and the load was over 520 
>>> (quad Xeon system).  There were about 1500 processes with "amanda" as 
>>> the 
>> 
>> 
>> 
>> Have close look at your xinetd configuration for amanda.
>> Maybe you have "wait = no", instead of "yes" in the file?
>> 
>> service amanda
>> {
>>         socket_type             = dgram
>>         protocol                = udp
>>         wait                    = yes
>>         user                    = amanda
>>         group                   = disk
>>         server                  = /usr/local/libexec/amandad
>>         disable                 = no
>> }
>> 
> It has wait=yes already.
> 
>> 
>> Just a guess.
>> Have a look in /tmp/amanda/*debug files too.
>> 
> After only one session I have 1029 files in the /tmp/amanda directory.  
> Lookiing at the most recent, I see:
> 
> amandad: time 40.815: sending ACK pkt:
> <<<<<
>  >>>>>
> amandad: try_socksize: send buffer size is 65536
> amandad: try_socksize: receive buffer size is 65536
> amandad: time 57.335: stream_server: waiting for connection: 0.0.0.0.33752
> amandad: try_socksize: send buffer size is 65536
> amandad: try_socksize: receive buffer size is 65536
> amandad: time 57.335: stream_server: waiting for connection: 0.0.0.0.33753
> amandad: try_socksize: send buffer size is 65536
> amandad: try_socksize: receive buffer size is 65536
> amandad: time 57.335: stream_server: waiting for connection: 0.0.0.0.33754
> amandad: time 57.335: sending REP pkt:
> <<<<<
> CONNECT DATA 33752 MESG 33753 INDEX 33754
> OPTIONS features=fffffeff9ffe0f;
>  >>>>>
> amandad: time 57.336: received ACK pkt:
> <<<<<
>  >>>>>
> amandad: time 87.327: stream_accept: timeout after 30 seconds
> amandad: time 87.327: stream 0 accept failed: bad SECURITY line: ''
> amandad: time 117.327: stream_accept: timeout after 30 seconds
> amandad: time 117.327: stream 1 accept failed: bad SECURITY line: ''
> amandad: time 147.327: stream_accept: timeout after 30 seconds
> amandad: time 147.327: stream 2 accept failed: bad SECURITY line: ''
> amandad: time 148.337: pid 11395 finish time Wed Jan  7 08:04:36 2004
> 
> 
> However at this point the server may be totally hung and things may not be 
> working well.  The first one (right after the backup started) seems to be 
> working better:
> 
> amandad: time 1.026: sending ACK pkt:
> <<<<<
>  >>>>>
> amandad: time 6.573: sending REP pkt:
> <<<<<
> OPTIONS features=fffffeff9ffe0f;
> /big/www/docs 0 SIZE 4984710
> /big/mysqldata 0 SIZE 31290
> /var/log 0 SIZE 6390
> /boot 0 SIZE 32335
>  >>>>>
> amandad: time 6.574: received ACK pkt:
> <<<<<
>  >>>>>
> amandad: time 30.581: pid 4004 finish time Wed Jan  7 01:01:56 2004
> 
> 
> Mike

You have way too many processes (and /tmp/amanda/* files).  Maybe
Amanda is forking off too many amandad for some reason, or perhaps
you have some wildcard in your disklist or include list that is
expanding out way beyond what you think it is.
   Also, are the client and server running the same version of
Amanda?

Frank



-- 
Frank Smith                                      fsmith AT hoovers DOT com
Systems Administrator                           Voice: 512-374-4673
Hoover's Online                                   Fax: 512-374-4501


<Prev in Thread] Current Thread [Next in Thread>