Amanda-Users

Re: system load

2007-07-31 13:47:59
Subject: Re: system load
From: Chris Hoogendyk <hoogendyk AT bio.umass DOT edu>
To: Brian Cuttler <brian AT wadsworth DOT org>
Date: Tue, 31 Jul 2007 13:36:49 -0400


Brian Cuttler wrote:
Chris,

On Tue, Jul 31, 2007 at 10:50:56AM -0400, Chris Hoogendyk wrote:
uptime
 9:04am  up 18:58,  2 users,  load average: 55.46, 51.27, 51.43
whoah! that's a load.

I seem to recall you are on solaris.

You have a good memory, though I also have servers on IRIX and
linux boxes this is a Solaris system. Its an E250 that we use
as a Lotus Notes server.

We are down to two concurrent dumps and the load average dropped
to 33, still rather high, and it will be some hours before I'm able
to tell you what the load average is when amanda is not running.

oh, my other LNotes server on the other E250 currently has a load
average of 0.09, somehow that doesn't seem right either.

Because the partitions are large enough to span tapes we've been
breaking them down into tar-s, we have better fit in the spool
area and on tape and have increased concurrency even though we
didn't increase maxdumps (this system has itself as the only client).
The most recent change having been an issue Jon helped with on Thursday,
where I had to divide a large partition into multiple DLEs and use
excludes since (unlike my other large partition) it didn't have any
divisions of files near the root of the file system.

Actually, since there are no interactive users on this system the
only indication I had of an issue was the fact that sendmail stopped
sending me stuff, smtpd doesn't accept connections, even from the
local box, if the load average is too high.


So, you have sendmail and lotus notes running on it. Does the other lotus notes server load balance with this one? Or do they serve different groups? Does it spawn a mess of processes the way sendmail does? The load that is reported by uptime is the average number of jobs in the run queue over the last minute, five minutes and ten minutes. I think that means jobs that are just twiddling their thumbs waiting to get the cpu.

Amanda ought not to be cpu intensive. But I suppose if you are doing software compression, running the local backup, it could chew a bit.

Are you Solaris 9? The older Solaris versions are notably slower than Solaris 9. I haven't jumped to 10 yet because there is so much change. Going to 9 was easy and a significant payoff in speed.

How is the hardware configured on your E250? They have a pretty high capacity for throughput, but if you're running multiple dumpers on different partitions of the same internal drive and then putting holding space on another partition and going to an internal tape drive, I can see how you might tie things up so that other processes can't get a transfer in edgewise. On my E250, I've configured a couple of totally separate 10Krpm Seagate Cheetahs for holding disks. I've also added a PCI LV320 SCSI card to connect my tape drive, so that i/o is actually on a separate bus (it's in the 66 pci slot, not any of the 33's -- all the 33's share one bus). I also have a 4-way 10/100 PCI ethernet card, but I haven't configured that yet. It won't do me any good until we have gigabit running between the switch rooms. Then the E250 could do trunking and blend 4x100 into one pipe.

Memory is another issue. Dual CPUs (what speed are yours?), with 2G memory. Swap space? I put 2G on each of the drives that is at or over 10Krpm.

Have you tried running top? You can get it from sunfreeware. It can put a little extra demand on the server, but I sometimes do it for just a minute or so to give me a frame of reference for what the top processes are in cpu usage. On my department server, the few incidents where smtp refused connections, I found all the top processes were mimedefang. That told me pretty definitively what was going on. From there I could look at /var/adm/messages, mail.log, and uw-imap.log to confirm the level of activity and the sources. Firewalled a couple of IP subnets on the pac rim and all was happy again.

Also, after the backups have run and you have the reports, you could look them over carefully and see if there were significant slowups on the DLE's running after some particular time in the morning.



---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk AT bio.umass DOT edu>

---------------
Erdös 4



<Prev in Thread] Current Thread [Next in Thread>