Networker

Re: [Networker] Slow index browsing - additional

2003-07-28 17:17:25
Subject: Re: [Networker] Slow index browsing - additional
From: "Reed, Ted G II [ITS]" <ted.reed AT MAIL.SPRINT DOT COM>
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Date: Mon, 28 Jul 2003 16:17:15 -0500
Steve,
I understand being stuck with hardware.  And the OVERALL legato environment is 
obviously "multi-threaded" in the sense that multiple SINGLE threaded processes 
are running simultaneously towards a single end-point goal.  However, no single 
binary making up the environment is a multi-thread coded process.  So nsrd will 
always run on one available cpu, nsrmmdbd will run on another available cpu, 
etc.  So the whole of legato will make use of multiple processors, but any 
single process won't.

My suggestion....can you move your v880 to master server?  I assume the v880 
has better per/cpu speeds than the 6500 (per spec).  The reason, in a 
master/storage node environment, you want all your nodes to be multiple cpu, 
good ram, lots of i/o (Ethernet, fibre vs. scsi, etc)......but they don't have 
to be speedy.  Your master on the other hand, can survive quite nicely on a 
dual processor system......as long as the 2 cpus ARE speedy.  If you can't do 
that, try bugging Legato for a Solaris 8 optimized build of the legato 
binaries.  We got one under some specific patch for 6.1.1 and it helped, I'm 
sure they have a build for 6.1.3.  Lastly, if you are moving backups through 
the master (and the e6500 makes a great storage node, lots of I/O 
capabilities), you might move that load to a storage node and let the master 
concentrate on index and DB work.

--------------------------------------------------------

I have multiple legato environments, the biggest dog is:
*  master: sun e450 4x400MHz, 3GB ram.  solaris 8 within last 6 months.  
[networker 7.0]
*  storage node:  2x Sun e4500, each w/ 8x400Mhz, 4GB ram.  solaris 8 over 1 
year  [networker 7.0]
*  STK 9310 "powderhorn" silo with (24) STK 9840a drives (10M/sec native, 
20G/tape native)
        - 10 per node, 4 for master
*  8+ TB per night.  Tops out around 10-12TB in a 24hr period (due to single 
big saveset clients).  No exchange or netware, thank god.  5 Unixes and 2 
Windows versions, 500+ clients 
*  No SAN backups.  All backups to tape, no disk-to-disk
*  ALL backups go to nodes.  The master ONLY does indexes.  (at best, 2 or 3 
exceptions)

I am in the middle of a redesign going to STK 9940B (30M/sec, 200G/tape native) 
and 2GB fibre HBAs, plus multiple Gb ethernet per node.  I have money for 
drives and cards, not servers.  My build out will be:
        * master, same e450, 2x Gb Ether, 2x 2Gb HBA (san + tape), 2-4x 9940B
        * node, same 2x e4500.  Each:  4x Gb Ether, 3x 2Gb HBA, 6x 9940B
        * Same design paradigm for client configs (index to specific master 
NIC, data to specific node NIC)
        * requesting clients w/ >250GB in a single save set get Gb Ether for 
backups
                - Trying to get rid of >24 hour backups of >500GB single 
filesystems

My end point goal is .5TB per storage node per hour.  I want to do all my 
backups in less than 12 hours.  Based on info I have from sun, .5TB per hour on 
an e4500 is possible, but it'll be tight.  Their primary advice was "One Gb 
Ethernet = One CPU resource", "One HBA = One CPU resource".  At 4x ethernet and 
3x HBA, I'm hoping to have enough horsepower left over on the e4500's to run 
the legato storage node services and have a comfort zone of available resources.

--Ted

PS.  I'm happy with 7.0 overall.  The conversion was simple and utterly 
painless.  Some nice new features but the best thing to me is the general 
decrease in the amount of cpu resources needed by nsrd.  My biggest complaint:  
nsrim now uses 'nsrmmdbd' instead of 'nsrd' as it's work horse......and it runs 
quite a bit longer, subjectively, than the 6.x nsrim process.  However, I can 
still interact with everything BUT the jukebox about 20x faster than I used to 
be able to.  I.E. the GUI doesn't hang during nsrim any more.  It's even 
responsive in the middle of the night w/ backups running and loads high.



-----Original Message-----
From: Paige, Steve [mailto:Steve.Paige AT chep DOT com]
Sent: Monday, July 28, 2003 2:27 PM
To: Reed, Ted G II [ITS]
Cc: Legato NetWorker discussion
Subject: RE: [Networker] Slow index browsing - additional


Thanks Ted, we didn't have a choice at this DR we were stuck with the slower 
processors - 250 Mhz.  We have 2 unboxed v880's with (4) 1.2 Sparc III cpus 
sitting here I have been drooling on, but not sure if management is going to 
let me swap our e6500 for it!  Are you saying entire NetWorker product is 
single threaded?  That does stink.  Any suggestions for getting more speed out 
of this thing, other than getting me hands on that 880? If it is single 
threaded, I'd say it was not a scalable product.

You run on a Solaris system?  What else is in your setup?  jukeboxes? using a 
SAN? backing up to disk? Yea, Solaris has much better memory management than 
2.6.  We run Solaris 8 on our e6500 for the Legato server.

*       Backup server: Sun Enterprise 6500 - (10) 400 Mhz cpus and (10) gigs of 
memory running Solaris 8
*       (4) storage nodes including backup server - (2) SunFire 6800's and (1) 
v880 in Tape SAN config
*       over 100 UNIX clients & over 100 NT/Netware server
*       L11000 with (16) DLT7000 drives and L700 with (12) 9840 drives
*       over 10 TBs a night gets backed up, OS and databases (UNIX, NT, 
Netware, M$ Exchange, Oracle, SAP, SQL, etc)


Your thoughts about version 7.  I heard 7.1 was due out in Oct. and Legato 
support told us to hold off for 7.1

Thanks for your 2 cents man

Steve

*******************************************
Steve Paige
UNIX System Administrator
407-355-6819 Desk
407-929-6483 Mobile
steve.paige AT chep DOT com




-----Original Message-----
From: Reed, Ted G II [ITS] [mailto:ted.reed AT MAIL.SPRINT DOT COM]
Sent: Monday, July 28, 2003 12:53 PM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] Slow index browsing - additional


$.02
----
The truth is, any single process used by legato is a single threaded process.  
Therefore, both nsrd and nsrmmdbd jobs use a single processor for their work 
(and hopefully not the same one).  Most of the time, this means that you will 
see the nsrd as the hottest process on the server.....but only on the single 
CPU it is utilizing.  I have noticed that on 7.0 however, the nsrmmdbd is the 
hot job, but only during the nsrim process.  We have noted dramatic changes in 
overall CPU usage by the nsrd process in the 7.0 upgrade.  It seems to be 
primarily the result of divvying up the DB into smaller, more manageable 
chunks.  Our 4x400Mhz e450 is now viable as a master again, instead of being 
continually pegged on a single cpu!!

In a nutshell, your DR box should consolidate more of that power into a smaller 
number of CPUs.  While you do need some processor cycles to run the nsrmmd jobs 
to interact with the tapes, your primary user and abuser of cpu cycles will be 
nsrd on any pre-7.0 implementation.  In fact, our system (pre-7.0) was so 
single-cpu loaded prior to 7.0 that we were investigating a 2x1.2GHz v280 as a 
new master.  Now that is not so much an issue.

If you are leasing the DR space, ask for more per-cpu horsepower.  As long as 
your cap is 250MHz/cpu, it will be painful to do much simultaneous legato 
work.....even 7.0 might not help as much as you'd like.  One last thought:  
Legato IS scalable, what it ISN'T is truly multi-threaded, so a single CPU can 
STILL be a bottleneck.  But the faster that cpu is, the more scalable legato as 
an ENVIRONMENT is.

Other thoughts on 7.0:
nsrim runs much longer and with nsrmmdbd as the 'hot' process
GUI remains responsive during nsrim and/or worst of backup cycle!!
During any but worst client backup load, nsrd < 1 cpu resources!
nsrmmdbd:  the "new" legato bottleneck?

--Ted


PS.  Another user of this list told me about Solaris 2.6 binaries of Solaris 
being the primary build available.  If you are running Solaris 8, you might 
query Legato about a Sol8 built binary of the networker application.  
Apparently this user has seen a marked difference due to Solaris 8 build 
optimizations (memory, libraries, etc) that are NOT there in the 2.6 build 
binaries.  YMMV.



-----Original Message-----
From: Paige, Steve [mailto:Steve.Paige AT CHEP DOT COM]
Sent: Monday, July 28, 2003 7:14 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] Slow index browsing - additional


At DR we had (10) 250Mhz in a e6000.  We have similar problems in Orlando with 
a e6500 with (10) 400Mhz cpus and 12 gigs of memory.  I figured the nsrd ps was 
single-threaded, how about nsrmmdbd?  The sad thing, NetWorker is the only 
non-system resource running on this system!  This is the first diaster reovery 
test where I have been in charge of the acutal restores and not just the 
rebuilding of systems and I was very disappointed with how NetWorker ran.  If 
we can not restore all our data at the same time (with in reason) in the event 
of a actual Disaster, this product worthless.  It seems NetWorker is not 
scalable and we have gone as far as we can with this product.  

More thoughts please.

Steve 

*******************************************
Steve Paige
UNIX System Administrator
407-355-6819 Desk
407-929-6483 Mobile
steve.paige AT chep DOT com




-----Original Message-----
From: Davina Treiber [mailto:treiber AT HOTPOP DOT COM]
Sent: Monday, July 28, 2003 6:03 AM
To: NETWORKER AT LISTMAIL.TEMPLE DOT EDU
Subject: Re: [Networker] Slow index browsing - additional


On Mon, 28 Jul 2003 09:16:35 +0200, Riaan Louwrens
<riaanl AT SOURCECONSULTING.CO DOT ZA> wrote:

>I would like to jump onto the back of this question and ask whether there
>are any "utilities" for Unix / Solaris etc. We as well have a 480R box that
>seems to only use 1 of the 2 processors. Is there a way to "force"
Networker
>to use a second (i.e. 3rd / 4th - 10th) processor ?
>
>I know from all the guides and diswcussions that Legato is NOT single
>threaded (allegedly), the performance tuning guide does not really say
>anything helpfull at all (disk and tape write speeds tests are 100%). I
>guess performance tuning / debugging on the various OS's are only gained
>through experience ...

This has been discussed fairly recently on the list.

NetWorker as a whole product is not single-threaded, however the nsrd
process is. nsrd can often be the bottleneck, and can be seen to take up
100% of one CPU while others are idle. Over the years, Legato have improved
the efficiency of nsrd, the latest improvement being the splitting of the
res files in 6.2 and 7.0. There is still more to do IMHO.

Because of nsrd being the bottleneck, you will usually find that for a
NetWorker server, faster CPUs is much preferable to more CPUs. A large NW
server will be struggling with 250MHz CPUs on a Sun box, however many of
them you have.

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

--
Note: To sign off this list, send a "signoff networker" command via email
to listserv AT listmail.temple DOT edu or visit the list's Web site at
http://listmail.temple.edu/archives/networker.html where you can
also view and post messages to the list.
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

<Prev in Thread] Current Thread [Next in Thread>