Amanda-Users

Re: Very Strange slowness issue

2003-05-12 12:10:36
Subject: Re: Very Strange slowness issue
From: Jon LaBadie <jon AT jgcomp DOT com>
To: amanda-users AT amanda DOT org
Date: Mon, 12 May 2003 12:06:17 -0400
On Mon, May 12, 2003 at 09:22:55AM -0400, Tim Champ wrote:
> Hello to all!  I'm new to the list, but have been using amanda for 3
> years.  And this is the first stumper I've had.


That must be a first, no startup questions, nor any questions for 3 years :)

> Also, if I run ufsdump on the machine locally, and dump from one disk to
> another (on the same chain), I get 3-4MB per second.  IF I do that and
> pipe it through gzip (as amanda does) I get 400KB per second.  But, if

Not related, but you might want to take less compression and better
speed with the -fast option in your amanda.conf


> Now - these are the things that may be causing the problem:
> 
> The machine worked fine until a planned downtime where I updated the PROM,
> and added a second CPU.  Also, I split up the disks among the two SCSI
> busses.  Now, I get no errors from any disks, and I have syslog logging on
> debug.  The CPU seems fine, as it hasn't crashed once, or had any errors.

both cpu's in use correct (psrinfo(1))

> I'm stumped.  One think I've noticed is that while amanda is dumping, it
> seems to use six "dumps" which seems strange, but I've been told that it's
> normal.  One spawns another, which spawns 4 more.  The sendbackup.debug
> shows how long it takes, as you can see here:

When I first read this I thought too much contention for a single disk
when dumps of multiple DLE's on the same disk were started.  This could
be addressed with the "spindle" argument in the disklist.  But your ps
output below makes me wonder.

> 
> AND - here is a ps from the machine (with 6 "dump" processes running,
> which I don't have a "dump" command on my machine.  I have ufsdump, which
> shows as running when I do a top, but not when I do a ps):
> 
>     root 12236 12235  0 04:29:43 ?        0:03 dump 1usf 1048576 -
> /dev/rdsk/c0t2d0s6
>     root 12234 12232  0 04:29:33 ?        0:01 dump 1usf 1048576 -
> /dev/rdsk/c0t2d0s6
>     root 12235 12234  0 04:29:43 ?        0:02 dump 1usf 1048576 -
> /dev/rdsk/c0t2d0s6
>     root 12237 12235  0 04:29:43 ?        0:03 dump 1usf 1048576 -
> /dev/rdsk/c0t2d0s6
>     root 12238 12235  0 04:29:43 ?        0:03 dump 1usf 1048576 -
> /dev/rdsk/c0t2d0s6
>     root 13213   350  0 09:18:43 pts/0    0:00 grep dump
>     root 12239 12235  0 04:29:43 ?        0:12 dump 1usf 1048576 -
> /dev/rdsk/c0t2d0s6
> 

The name change (ufsdump -> dump) is not surprising.  You can run
any unix command from a compiled program and tell it that its name
is whatever you want.  The exec(2) system call lets you say want
you want to run and what its zero'th argument (the name) should be
as two separate arguments.  So amanda must be doing something like

    exec..("/usr/sbin/ufsdump", "dump", "1ufs", .....)

The thing that gets me is all the ufsdumps are running on the same
partition (slice), c0t2d0s6.  That would certainly cause contention
if 6 simultaneous dumps of the same partition competed for the same
part of the same disk.

It looks like all six are running (accumulating cpu time).  I suspect
that if you checked sar(1) reports or iostat -d (or is it -D) output
for that disk during a manual ufsdump and an amanda dump you would
see a very large jump in the service time for io requests.

Your note of the parentage of the six processes is reminiscent of the
way shells run pipelines.  If parent shell "P" is given a command line
of "A | B | C | D | E", the parent shell only forks one child process.
That child will become "E", but before it does so, it forks off all the
remaining processes, "A" ... "D".  You have a similar scenario, the
original ufsdump is 12234 which creates 12235.  12235 in turn is the
parent of 12236, 237,238, and 239.

>From your ps output, the parent of the original ufsdump (234) is 12232.
Any idea what that process was?  Any chance you are running a shell script
wrapper around anything?

Just some thoughts that probably won't get at your problem but might
stimulate some productive thoughts.

jl
-- 
Jon H. LaBadie                  jon AT jgcomp DOT com
 JG Computing
 4455 Province Line Road        (609) 252-0159
 Princeton, NJ  08540-4322      (609) 683-7220 (fax)