Re: system load
2007-07-31 13:47:59
Brian Cuttler wrote:
Chris,
On Tue, Jul 31, 2007 at 10:50:56AM -0400, Chris Hoogendyk wrote:
uptime
9:04am up 18:58, 2 users, load average: 55.46, 51.27, 51.43
whoah! that's a load.
I seem to recall you are on solaris.
You have a good memory, though I also have servers on IRIX and
linux boxes this is a Solaris system. Its an E250 that we use
as a Lotus Notes server.
We are down to two concurrent dumps and the load average dropped
to 33, still rather high, and it will be some hours before I'm able
to tell you what the load average is when amanda is not running.
oh, my other LNotes server on the other E250 currently has a load
average of 0.09, somehow that doesn't seem right either.
Because the partitions are large enough to span tapes we've been
breaking them down into tar-s, we have better fit in the spool
area and on tape and have increased concurrency even though we
didn't increase maxdumps (this system has itself as the only client).
The most recent change having been an issue Jon helped with on Thursday,
where I had to divide a large partition into multiple DLEs and use
excludes since (unlike my other large partition) it didn't have any
divisions of files near the root of the file system.
Actually, since there are no interactive users on this system the
only indication I had of an issue was the fact that sendmail stopped
sending me stuff, smtpd doesn't accept connections, even from the
local box, if the load average is too high.
So, you have sendmail and lotus notes running on it. Does the other
lotus notes server load balance with this one? Or do they serve
different groups? Does it spawn a mess of processes the way sendmail
does? The load that is reported by uptime is the average number of jobs
in the run queue over the last minute, five minutes and ten minutes. I
think that means jobs that are just twiddling their thumbs waiting to
get the cpu.
Amanda ought not to be cpu intensive. But I suppose if you are doing
software compression, running the local backup, it could chew a bit.
Are you Solaris 9? The older Solaris versions are notably slower than
Solaris 9. I haven't jumped to 10 yet because there is so much change.
Going to 9 was easy and a significant payoff in speed.
How is the hardware configured on your E250? They have a pretty high
capacity for throughput, but if you're running multiple dumpers on
different partitions of the same internal drive and then putting holding
space on another partition and going to an internal tape drive, I can
see how you might tie things up so that other processes can't get a
transfer in edgewise. On my E250, I've configured a couple of totally
separate 10Krpm Seagate Cheetahs for holding disks. I've also added a
PCI LV320 SCSI card to connect my tape drive, so that i/o is actually on
a separate bus (it's in the 66 pci slot, not any of the 33's -- all the
33's share one bus). I also have a 4-way 10/100 PCI ethernet card, but I
haven't configured that yet. It won't do me any good until we have
gigabit running between the switch rooms. Then the E250 could do
trunking and blend 4x100 into one pipe.
Memory is another issue. Dual CPUs (what speed are yours?), with 2G
memory. Swap space? I put 2G on each of the drives that is at or over
10Krpm.
Have you tried running top? You can get it from sunfreeware. It can put
a little extra demand on the server, but I sometimes do it for just a
minute or so to give me a frame of reference for what the top processes
are in cpu usage. On my department server, the few incidents where smtp
refused connections, I found all the top processes were mimedefang. That
told me pretty definitively what was going on. From there I could look
at /var/adm/messages, mail.log, and uw-imap.log to confirm the level of
activity and the sources. Firewalled a couple of IP subnets on the pac
rim and all was happy again.
Also, after the backups have run and you have the reports, you could
look them over carefully and see if there were significant slowups on
the DLE's running after some particular time in the morning.
---------------
Chris Hoogendyk
-
O__ ---- Systems Administrator
c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst
<hoogendyk AT bio.umass DOT edu>
---------------
Erdös 4
|
|
|