Veritas-bu

Re: [Veritas-bu] Long time between "requesting resources" and "grantingresources"

2008-07-10 16:13:52
Subject: Re: [Veritas-bu] Long time between "requesting resources" and "grantingresources"
From: <Mark.Donaldson AT cexp DOT com>
To: <mikemclain AT northwesternmutual DOT com>, <Veritas-bu AT mailman.eng.auburn DOT edu>
Date: Thu, 10 Jul 2008 13:54:04 -0600
This is great, Mike.

You said you graphed the pain points?  What numbers in that log record
reflect painful values?

I'm at 6.5.1 on my code - I think you gave me a reason to accelerate
6.5.2.

My "msec" numbers are significantly more than your numbers below,
between 10 and 22 minutes for my most recent log file.  FWIW, My logging
seems to be less verbose than yours, too:

> grep msec /usr/openv/logs/51216-118-2200013915-080710-0000000002.log
0,51216,118,118,2302499,1215708383105,19630,12,0:,124:Resource
evaluation completed. Evaluated 308 requests, sent resource pending for
175 requests, evaluation time: 1167778
msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2303108,1215709606394,19630,11,0:,122:Resource
evaluation completed. Evaluated 312 requests, sent resource pending for
0 requests, evaluation time: 1223288 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2303747,1215710991692,19630,12,0:,122:Resource
evaluation completed. Evaluated 304 requests, sent resource pending for
0 requests, evaluation time: 1385297 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2304323,1215712249423,19630,11,0:,122:Resource
evaluation completed. Evaluated 311 requests, sent resource pending for
0 requests, evaluation time: 1257729 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2304847,1215713417318,19630,12,0:,122:Resource
evaluation completed. Evaluated 300 requests, sent resource pending for
0 requests, evaluation time: 1167894 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2305555,1215714563259,19630,11,0:,122:Resource
evaluation completed. Evaluated 300 requests, sent resource pending for
6 requests, evaluation time: 1145939 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2306130,1215715234879,19630,12,0:,123:Resource
evaluation completed. Evaluated 284 requests, sent resource pending for
157 requests, evaluation time: 671619
msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2306685,1215716103486,19630,11,0:,122:Resource
evaluation completed. Evaluated 258 requests, sent resource pending for
26 requests, evaluation time: 868605 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2307131,1215716699437,19630,12,0:,121:Resource
evaluation completed. Evaluated 225 requests, sent resource pending for
0 requests, evaluation time: 595950 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2307633,1215717694512,19630,11,0:,121:Resource
evaluation completed. Evaluated 207 requests, sent resource pending for
0 requests, evaluation time: 995073 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2308132,1215718427500,19630,12,0:,122:Resource
evaluation completed. Evaluated 197 requests, sent resource pending for
10 requests, evaluation time: 732986 msec,25:ResBroker_i::doEvaluation,1
0,51216,118,118,2308621,1215719097241,19630,11,0:,122:Resource
evaluation completed. Evaluated 175 requests, sent resource pending for
44 requests, evaluation time: 669739 msec,25:ResBroker_i::doEvaluation,1

 

-----Original Message-----
From: mikemclain AT northwesternmutual DOT com
[mailto:mikemclain AT northwesternmutual DOT com] 
Sent: Thursday, July 10, 2008 1:27 PM
To: Donaldson, Mark - Broomfield, CO; Veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Long time between "requesting resources" and
"grantingresources"

Mark,

We experienced the same issue you are referring to when we upgraded to
NBU 6.0 MP5 last fall.     After working with Symantec support and
engineering they have made very good improvements in the performance of
the resource broker (nbrb).

The problem would manifest itself as more and more jobs (1000+) were
added to the queue and the fact that we were using storage unit groups
and Inline-tape-copy (ITC).    We would see requesting/granting wait
times for disk or tape jobs at 20-40 minutes even though the resources
were available.  Engineering enhanced the nbrb code to cache the
"resource availability" data rather than having to query the EMM
database for each allocation request.

I'm not sure if this log output is in the 6.5.1 nbrb, but you can view
the nbrb evaluation cycle performance data by parsing the raw nbrb (118)
vxlogs for the word "msec":

        grep msec /usr/openv/logs/51216-118-* 

It will show you the runtime stats and how well the caching is doing:

51216-118-1067800690-080705-0000000000.log:0,51216,118,118,3133941,12152
35374476,22178,10,0:,345:Resource evaluation completed. Evaluated 243
requests, evaluation time: 12398 msec, <EVALPERF> numSucceeded=2,
numQueued=241, numPended=0, numFailed=0, numMultipleCopyJobs=20,
numMdsCacheMisses=3, numMDARReleaseDuringEval=4, restartedEvalCycle=0,
numDriveReleasesBeforeEval=0, numDrivesAvailStart=211,
numDrivesAvailEnd=211, requestQueueSize=241,25:ResBroker_i::doEvaluation

# While we were troubleshooting this with Symantec, I ended up writing a
Perl script to parse these stats for graphing in Excel so we could see
the pain points and help narrow down the issue.

I believe the updated nbrb code should be in 6.5.2 & 6.0 MP6 and there
are some tweaks that can be done via a new config file
(/usr/openv/var/global/nbrb.conf) to improve performance.

        RESPECT_REQUEST_PRIORITY=0
        SECONDS_FOR_EVAL_LOOP_RELEASE=30

HTH

-Mike

-----Original Message-----
From: veritas-bu-bounces AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-bounces AT mailman.eng.auburn DOT edu] On Behalf Of
Mark.Donaldson AT cexp DOT com
Sent: Wednesday, July 09, 2008 5:41 PM
To: Veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Long time between "requesting resources" and
"grantingresources"

Hi all,

I'm seeing some apparently slow backups.  When I dig into the job
details, I see that often there's a long delay between the "requesting
resources" messages and the "granting resources" messages.  

The master server is an 8 CPU, 16G Solaris 8 v880 box.  It's got
NBv6.5.1.

It's running with an average of 50% idle or so but the load average is
kinda high: 


last pid:  8851;  load averages:  4.01,  3.66,  3.60
11:50:46
170 processes: 164 sleeping, 1 zombie, 1 stopped, 4 on cpu
CPU states: 53.2% idle, 13.0% user, 28.8% kernel,  5.0% iowait,  0.0%
swap
Memory: 16G real, 8702M free, 2372M swap in use, 24G swap free



Here's the details of a job that ran slower than I'd like.  36 minutes
from "request" to "granted".  It's a high-priority backup, highest of
all of them, FWIW.  It, and many of my policies now, are backing up to
storage unit groups.   The AP-01 client below is an AIX media server
with NBv6.

07/09/2008 08:36:45 - requesting resource AP01_Pri
07/09/2008 08:36:45 - requesting resource ug-50.NBU_CLIENT.MAXJOBS.ap-01
07/09/2008 08:36:45 - requesting resource
ug-50.NBU_POLICY.MAXJOBS.AP-01_RM_Arch_iprd
07/09/2008 09:12:21 - granted resource  ug-50.NBU_CLIENT.MAXJOBS.ap-01
07/09/2008 09:12:21 - granted resource
ug-50.NBU_POLICY.MAXJOBS.AP-01_RM_Arch_iprd
07/09/2008 09:12:21 - granted resource  005195
07/09/2008 09:12:21 - granted resource  LTO2-06
07/09/2008 09:12:21 - granted resource  AP01-ADIC2
07/09/2008 09:12:25 - started process bpbrm (pid=33284230)
07/09/2008 09:12:26 - connecting
07/09/2008 09:12:37 - connected; connect time: 0:00:00
07/09/2008 09:12:49 - mounted 005195
07/09/2008 09:12:49 - positioning 005195 to file 79
07/09/2008 09:12:51 - positioned 005195; position time: 0:00:02
07/09/2008 09:12:51 - begin writing
07/09/2008 09:13:51 - end writing; write time: 0:01:00
the requested operation was successfully completed (0)

Anybody seeing this?  Any reasons you know of?

-Mark

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

This e-mail and any attachments may contain confidential information of
Northwestern Mutual. If you are not the intended recipient of this
message, be aware that any disclosure, copying, distribution or use of
this e-mail and any attachments is prohibited. If you have received this
e-mail in error, please notify Northwestern Mutual immediately by
returning it to the sender and delete all copies from your system.
Please be advised that communications received via the Northwestern
Mutual Secure Message Center are secure. Communications that are not
received via the Northwestern Mutual Secure Message Center may not be
secure and could be observed by a third party. Thank you for your
cooperation.


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu