Re: system load



Brian Cuttler wrote:

Chris,

On Tue, Jul 31, 2007 at 10:50:56AM -0400, Chris Hoogendyk wrote:

uptime

 9:04am  up 18:58,  2 users,  load average: 55.46, 51.27, 51.43

whoah! that's a load.

I seem to recall you are on solaris.


You have a good memory, though I also have servers on IRIX and
linux boxes this is a Solaris system. Its an E250 that we use
as a Lotus Notes server.

We are down to two concurrent dumps and the load average dropped
to 33, still rather high, and it will be some hours before I'm able
to tell you what the load average is when amanda is not running.

oh, my other LNotes server on the other E250 currently has a load
average of 0.09, somehow that doesn't seem right either.

Because the partitions are large enough to span tapes we've been
breaking them down into tar-s, we have better fit in the spool
area and on tape and have increased concurrency even though we
didn't increase maxdumps (this system has itself as the only client).
The most recent change having been an issue Jon helped with on Thursday,
where I had to divide a large partition into multiple DLEs and use
excludes since (unlike my other large partition) it didn't have any
divisions of files near the root of the file system.

Actually, since there are no interactive users on this system the
only indication I had of an issue was the fact that sendmail stopped
sending me stuff, smtpd doesn't accept connections, even from the
local box, if the load average is too high.

So, you have sendmail and lotus notes running on it. Does the otherlotus notes server load balance with this one? Or do they servedifferent groups? Does it spawn a mess of processes the way sendmaildoes? The load that is reported by uptime is the average number of jobsin the run queue over the last minute, five minutes and ten minutes. Ithink that means jobs that are just twiddling their thumbs waiting toget the cpu.

Amanda ought not to be cpu intensive. But I suppose if you are doingsoftware compression, running the local backup, it could chew a bit.

Are you Solaris 9? The older Solaris versions are notably slower thanSolaris 9. I haven't jumped to 10 yet because there is so much change.Going to 9 was easy and a significant payoff in speed.

How is the hardware configured on your E250? They have a pretty highcapacity for throughput, but if you're running multiple dumpers ondifferent partitions of the same internal drive and then putting holdingspace on another partition and going to an internal tape drive, I cansee how you might tie things up so that other processes can't get atransfer in edgewise. On my E250, I've configured a couple of totallyseparate 10Krpm Seagate Cheetahs for holding disks. I've also added aPCI LV320 SCSI card to connect my tape drive, so that i/o is actually ona separate bus (it's in the 66 pci slot, not any of the 33's -- all the33's share one bus). I also have a 4-way 10/100 PCI ethernet card, but Ihaven't configured that yet. It won't do me any good until we havegigabit running between the switch rooms. Then the E250 could dotrunking and blend 4x100 into one pipe.

Memory is another issue. Dual CPUs (what speed are yours?), with 2Gmemory. Swap space? I put 2G on each of the drives that is at or over10Krpm.

Have you tried running top? You can get it from sunfreeware. It can puta little extra demand on the server, but I sometimes do it for just aminute or so to give me a frame of reference for what the top processesare in cpu usage. On my department server, the few incidents where smtprefused connections, I found all the top processes were mimedefang. Thattold me pretty definitively what was going on. From there I could lookat /var/adm/messages, mail.log, and uw-imap.log to confirm the level ofactivity and the sources. Firewalled a couple of IP subnets on the pacrim and all was happy again.

Also, after the backups have run and you have the reports, you couldlook them over carefully and see if there were significant slowups onthe DLE's running after some particular time in the morning.




---------------

Chris Hoogendyk

-
  O__  ---- Systems Administrator
 c/ /'_ --- Biology & Geology Departments
(*) \(*) -- 140 Morrill Science Center

~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogendyk AT bio.umass DOT edu>

---------------

Erdös 4