Veritas-bu

[Veritas-bu] How many Storage Units do you have defined?

2001-09-27 12:54:20
Subject: [Veritas-bu] How many Storage Units do you have defined?
From: scott.kendall AT abbott DOT com (scott.kendall AT abbott DOT com)
Date: Thu, 27 Sep 2001 11:54:20 -0500
--0__=86256AD4004F2E268f9e8a93df938690918c86256AD4004F2E26
Content-type: text/plain; charset=us-ascii


I think it is important to understand how the stus are picked, and this is one
area where I have heard of many people making mistakes in their design.  You
didn't mention it, but I have to assume your using SSO to share the drives.
The rest of this will be based on that assumption.  For someone not sharing
drives or sharing them with something like StorageTek's SN6000 and ACSLS, it
is still important to understand how stus are picked.

When configuring a stu, you tell it which robot to use and you also must tell
it the maximum number of drives to use.  This does not have to be the total
number of drives in the robot.  If a media server has access to all 16 drives
in a robot and it's stu is told to only use 6 of those drives, it will use 6
and then will be full and the next stu in the list will be used.  It uses a
random 6 drives of the 16 and will use a different combination of up to 6
every time.  If the rest of the media servers do the same thing, all the
drives in the robot could be used and the load would be spread across multiple
media servers.  Additionally, what's nice is that if you don't have enough
backups to use all 16 drives at once and usually use only 12-14 of the drives,
you will not be using the same 12-14 every time.  The drive usage is also well
distributed.

As far as your design, as you mentioned, with the naming scheme you have
chosen mediaserverXrobotX, all of mediaserverA's stus will be next to each
other in the alphabetical list.  mediaserverA will have 4 drives in each robot
before the next media server is even used.  The media servers with the higher
letters won't get used unless there are a large number of backups going on at
the same time... and only if there are still drives left for their stus.
Which brings me to my next point.  You have a total of 96 physical drives in
your robots.  240 stus with 4 drives each gives you 960 logical drives.  This
is a HUGE "oversubscription" of drives.  Think about it: mediaserverA uses 4
drives from robot1, then mediaserverB uses another 4 and so on... by the time
you get to mediaserverI they're all used.  There is no way mediaserverI or J
will get drives... and if you're getting to 240 stus by then having 7 more
stus for each mediaserverA-H for the remaining 28 drives in the robot, they'll
never get used either!

What I would do is create 1 stu with 4 drives for each media server for each
32 drive robot.  Then a 2 drive stu for each media server for each 16 drive
robot.  Name the stus so they are picked out of the list alphabetically in a
manner that spreads the load nicely across the media servers and robots.  That
would give you a total of 40 stus.

When I met some of the NetBackup engineers recently, I told them that there
should be a "best practices" white paper on this.


- Scott



                                                                                
                                                   
                    "Lundy, Mark"                                               
                                                   
                    <mlundy02 AT sprintspectrum DOT com>        To:     
"'Veritas Support List Server'"                                   
                    Sent by:                             <veritas-bu AT 
mailman.eng.auburn DOT edu>                                       
                    veritas-bu-admin AT mailman DOT eng.        cc:             
                                                          
                    auburn.edu                           Subject:     
[Veritas-bu] How many Storage Units do you have defined?     
                                                                                
                                                   
                                                                                
                                                   
                    09/27/2001 07:57 AM                                         
                                                   
                                                                                
                                                   
                                                                                
                                                   




We seem to have run into a storage unit limit.  We have 4 robots (32, 32, 16
and 16 drives in each) and 10 media servers.  A Veritas consultant
recommended that we define storage units in a round-robin manner with 4
drives each so that the load gets spread out over the media servers and
robots as much as possible.
So, 96 drives divided by 4 equals 24 4-drive possible stus for our entire
robots.  24 4-drive stus times 10 media servers equals 240 storage units
total.  The stu definition looks something like this:

mediaserverArobot1    (4 drives)
mediaserverBrobot2    (4 drives)
mediaserverCrobot3    (4 drives)
mediaserverDrobot4    (4 drives)
mediaserverErobot1    (4 drives)
mediaserverFrobot2    (4 drives)
mediaserverGrobot3    (4 drives)
mediaserverHrobot4    (4 drives)
mediaserverIrobot1    (4 drives)
mediaserverJrobot2    (4 drives)
mediaserverArobot3    (4 drives)
mediaserverBrobot4    (4 drives)
mediaserverCrobot1    (4 drives)

and so on until all 240 are defined.  This Veritas consultant informed us
that stus are picked in alphabetical order,  Thus, when a backup stream is
started, as 4 drives become busy, another stus is chosen for any more jobs
and the load is spread out over not only the media servers, but over the
robots as well.
The problem with this config is, according to Veritas, every backup job has
to traverse the stus list and confirm its ability to communicate with each
stu  As you can imagine we have a large environment and over 3000 backup
jobs run nightly.  3000 * 240 = way too many for our E4500 master server to
deal with and all jobs expire with 196 errors.
Now, finally to my question,  what is the most stus any of you have defined?
I backed my config down to 40 stus, with each stu defined with the number of
drives in the library and it is working just fine.  However, that doesn't
accomplish the load balancing that I desire.  I.e, media server one will
handle all backup requests until robot one busies all of its 32 drives
before the any jobs go to media server 2 and/or robot two.  TIA.

Mark


-
Mark W. Lundy

I.T. Recovery Management
Work:  816.965.1131
FAX:    816.965.1042
 <mailto:mlundy02 AT sprintspectrum DOT com> mlundy02 AT sprintspectrum DOT com

(See attached file: SprintLogo.gif)


--0__=86256AD4004F2E268f9e8a93df938690918c86256AD4004F2E26
Content-type: image/gif; 
        name="SprintLogo.gif"
Content-Disposition: attachment; filename="SprintLogo.gif"
Content-transfer-encoding: base64

R0lGODdhbAAbAPcAAAAAAAAAVQAAqgAA/wAkAAAkVQAkqgAk/wBJAABJVQBJqgBJ/wBtAABtVQBt
qgBt/wCSAACSVQCSqgCS/wC2AAC2VQC2qgC2/wDbAADbVQDbqgDb/wD/AAD/VQD/qgD//yQAACQA
VSQAqiQA/yQkACQkVSQkqiQk/yRJACRJVSRJqiRJ/yRtACRtVSRtqiRt/ySSACSSVSSSqiSS/yS2
ACS2VSS2qiS2/yTbACTbVSTbqiTb/yT/ACT/VST/qiT//0kAAEkAVUkAqkkA/0kkAEkkVUkkqkkk
/0lJAElJVUlJqklJ/0ltAEltVUltqklt/0mSAEmSVUmSqkmS/0m2AEm2VUm2qkm2/0nbAEnbVUnb
qknb/0n/AEn/VUn/qkn//20AAG0AVW0Aqm0A/20kAG0kVW0kqm0k/21JAG1JVW1Jqm1J/21tAG1t
VW1tqm1t/22SAG2SVW2Sqm2S/222AG22VW22qm22/23bAG3bVW3bqm3b/23/AG3/VW3/qm3//5IA
AJIAVZIAqpIA/5IkAJIkVZIkqpIk/5JJAJJJVZJJqpJJ/5JtAJJtVZJtqpJt/5KSAJKSVZKSqpKS
/5K2AJK2VZK2qpK2/5LbAJLbVZLbqpLb/5L/AJL/VZL/qpL//7YAALYAVbYAqrYA/7YkALYkVbYk
qrYk/7ZJALZJVbZJqrZJ/7ZtALZtVbZtqrZt/7aSALaSVbaSqraS/7a2ALa2Vba2qra2/7bbALbb
Vbbbqrbb/7b/ALb/Vbb/qrb//9sAANsAVdsAqtsA/9skANskVdskqtsk/9tJANtJVdtJqttJ/9tt
ANttVdttqttt/9uSANuSVduSqtuS/9u2ANu2Vdu2qtu2/9vbANvbVdvbqtvb/9v/ANv/Vdv/qtv/
//8AAP8AVf8Aqv8A//8kAP8kVf8kqv8k//9JAP9JVf9Jqv9J//9tAP9tVf9tqv9t//+SAP+SVf+S
qv+S//+2AP+2Vf+2qv+2///bAP/bVf/bqv/b////AP//Vf//qv///yH5BAAAAAAALAAAAABsABsA
QAj/AP8JHEiwoMGD5eQdXMiwocOHECNGLEcInMWLGDH6Uyixo8ePIAkCA5Sx5MV4zR4CAPBvpUCW
LWHGfMlyZU2ZA3HOfMlzZ0iH4OL9HEq06MNyiEpuNMq0qVOJLlsajZoz582eM3UWHVkS5UebT8OG
pGhxqdizaP+RNVmWY9q3cCHa1LqQqsq5MsHmBRs3Il+9NKO6BOzTql66hYuCIsk2aKuGeLPCtBtZ
Mk6+BOcGtut0scV4j/uKHk26dEOKZk2rhohUo9vVsAm2Npk6turZjcHVti1Wnz15wOO1I4cut3GL
vI3quqcveLvnx42/FuwXcWKpkmlq53xQH3Pn7dhFVs/9OrD5wYNjakbPebJmg9wZNpcn/Hm58SZB
O0Tfk6rgydoVpFN8ASrGWG761WVVdpRdpV56C2bWYIFGcZVfaFBZl1xIFn6G4Ya2gQNGO0KBaOKJ
RgUEADs=

--0__=86256AD4004F2E268f9e8a93df938690918c86256AD4004F2E26--


<Prev in Thread] Current Thread [Next in Thread>