Amanda-Users

Re: VERY high load on server

2004-01-07 17:09:31
Subject: Re: VERY high load on server
From: Mike Heller <mike AT dsny DOT com>
To: amanda-users AT amanda DOT org
Date: Wed, 07 Jan 2004 14:09:53 -0800
Paul, reply inline

Paul Bijnens wrote:

Mike Heller wrote:

I have amanda running on serveral servers and last night I tried to back up one more to the tape server. When I arrived this morning, the backups were still running and the new server had an extremely high load


Am I correct that this is the first time that a backup is tried on
that host?

That is correct, it was the first time that this client has been backed up.


on it. It's a RedHat Linux 9.0 server and the load was over 520 (quad Xeon system). There were about 1500 processes with "amanda" as the



Have close look at your xinetd configuration for amanda.
Maybe you have "wait = no", instead of "yes" in the file?

service amanda
{
        socket_type             = dgram
        protocol                = udp
        wait                    = yes
        user                    = amanda
        group                   = disk
        server                  = /usr/local/libexec/amandad
        disable                 = no
}

It has wait=yes already.


Just a guess.
Have a look in /tmp/amanda/*debug files too.

After only one session I have 1029 files in the /tmp/amanda directory. Lookiing at the most recent, I see:

amandad: time 40.815: sending ACK pkt:
<<<<<
>>>>>
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 57.335: stream_server: waiting for connection: 0.0.0.0.33752
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 57.335: stream_server: waiting for connection: 0.0.0.0.33753
amandad: try_socksize: send buffer size is 65536
amandad: try_socksize: receive buffer size is 65536
amandad: time 57.335: stream_server: waiting for connection: 0.0.0.0.33754
amandad: time 57.335: sending REP pkt:
<<<<<
CONNECT DATA 33752 MESG 33753 INDEX 33754
OPTIONS features=fffffeff9ffe0f;
>>>>>
amandad: time 57.336: received ACK pkt:
<<<<<
>>>>>
amandad: time 87.327: stream_accept: timeout after 30 seconds
amandad: time 87.327: stream 0 accept failed: bad SECURITY line: ''
amandad: time 117.327: stream_accept: timeout after 30 seconds
amandad: time 117.327: stream 1 accept failed: bad SECURITY line: ''
amandad: time 147.327: stream_accept: timeout after 30 seconds
amandad: time 147.327: stream 2 accept failed: bad SECURITY line: ''
amandad: time 148.337: pid 11395 finish time Wed Jan  7 08:04:36 2004


However at this point the server may be totally hung and things may not be working well. The first one (right after the backup started) seems to be working better:

amandad: time 1.026: sending ACK pkt:
<<<<<
>>>>>
amandad: time 6.573: sending REP pkt:
<<<<<
OPTIONS features=fffffeff9ffe0f;
/big/www/docs 0 SIZE 4984710
/big/mysqldata 0 SIZE 31290
/var/log 0 SIZE 6390
/boot 0 SIZE 32335
>>>>>
amandad: time 6.574: received ACK pkt:
<<<<<
>>>>>
amandad: time 30.581: pid 4004 finish time Wed Jan  7 01:01:56 2004


Mike

<Prev in Thread] Current Thread [Next in Thread>