Veritas-bu

Re: [Veritas-bu] General questions for everyone

2007-11-02 12:46:46
Subject: Re: [Veritas-bu] General questions for everyone
From: "Rosenkoetter, Gabriel" <Gabriel.Rosenkoetter AT radian DOT biz>
To: "Cruice, Daniel (US - Glen Mills)" <dcruice AT deloitte DOT com>, veritas-bu AT mailman.eng.auburn DOT edu
Date: Fri, 2 Nov 2007 12:23:48 -0400
> WOW...good info...I should be a little clearer, on my media servers
(we have 7) they are Gig Fiber 
> direct runs to the backup network core...the majority of my clients
are 100mb due to old switches. 

That's okay, you want (most of) your clients on 100 Mb with that
quantity of clients: if they were also gig, then even one could swamp
your media servers' interfaces. (You could look at 802.11ad link
aggregation, which I have the impression that Win2k3 actually does
decently these days, or 10 GbE cards + TCP offloaders for the media
servers, but with the speeds you quote it doesn't sound like you need
it.)

So you've got 1 Gb/s per media server, which it would be nice to believe
meant 80 MB/s input (but it usually doesn't). At most, you want each
media server driving two tape drives with those input rates. If you have
each media server driving more than two drives at a time, you are NOT
getting better speed to tape by doing so. Let's say you've got one media
server getting 80 MB/s in and it's writing to four drives. 80 / 4 = 20,
so you're okay, but you aren't actually getting 80 MB/s. You're probably
getting more like 30 MB/s. 30 / 4 = 7.5, which means you're still not
feeding the drives data as fast as they need to have it, which means
you're still backhitching, which means you're going even slower.

I expect that you would be able to keep your backups well within that
window if you wrote fewer streams at a time: fewer streams will mean
that you don't subdivide the output (SAN fabric-side) bandwidth of each
media server by as much, which means that you'll stop backhitching,
which means that your overall backup speed will increase. Lowering your
MPX and lowering the number of drives allocated per media server STU
(incidentally, do you have 20 drives listed in each media servers' STU?
If so, you're confusing the NetBackup media management, because it'll
believe that you have 20 x <number of media servers> drives that way,
rather than only 20) will accomplish that.

--
gabriel rosenkoetter
Radian Group Inc, Unix/Linux/VMware Sysadmin / Backup & Recovery
gabriel.rosenkoetter AT radian DOT biz, 215 231 1556 

 

________________________________

From: Cruice, Daniel (US - Glen Mills) [mailto:dcruice AT deloitte DOT com] 
Sent: Friday, November 02, 2007 11:55 AM
To: Rosenkoetter, Gabriel; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] General questions for everyone



WOW...good info...I should be a little clearer, on my media servers (we
have 7) they are Gig Fiber direct runs to the backup network core...the
majority of my clients are 100mb due to old switches.  So the speed from
media server to tape is Gig...SAN attached.  We have been running in the
neighborhood of 3000 - 7000 KB/sec, which is pretty good.  Yes we have
some clients that are running between 900 - 1000 KB/sec and VMs which
share the physical nics, these are running around 1000 KB/sec.  So all
in all, things are working well...yet, we are running into the day on
backups on some of our QA / Dev environments.

 

And yes we did need to perform a recovery exercise...it was basically a
Disaster Recovery when our SAN decided to crash due to a power outage
and a faulty UPS subsequently corrupting a few TB of SAN data.  And yes
recovery effort was slow since we may have had 10 - 20 jobs on the same
tape about 150 servers over 3 (12 hour) days.  Yea painful, very
painful.  We are looking to move that 20 number down and still keep the
backups in our window.  Tedious process but it is being worked on.  

 

Thanks

Dan

From: Rosenkoetter, Gabriel [mailto:Gabriel.Rosenkoetter AT radian DOT biz] 
Sent: Friday, November 02, 2007 10:53 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] General questions for everyone

 

Wow, you have a lot of problems there. I'm picking the three big ones.

 

First, you don't mention how many media servers you have, but you do
mention your network interface speed as 100 Mb/s. 100 Mb/s is roughly 8
MB/s (being generous). That means that in order to feed your 20 LTO-3s
with even the minimum 10 MB/s they need to keep from backhitching, you
would need to have 25 media servers... but you can't write to the same
drive with more than one media server, so it is literally impossible for
you to supply the mininum input speed to actually spin your drives
without shoe-shining. In point of fact, if you really only have 100 Mb
inputs into your media servers you can NOT drive an LTO-3 with any one
of your media servers without causing it to backhitch. You can't get
data to it fast enough. Yes, this is a huge problem. Invest in gigabit
Ethernet or starting doing everything with BCVs/snapshots exported to
your media servers.

 

Second, have you performed any recovery tests since you bumped your MPX
up to that astronimical 20? You should. In general, recovery becomes
outrageously painful if not impossible when you stray above 4, or that's
the standard advice anyway. It's been a while since I checked, so if you
can manage to pull a restore successfully and meet your RTO with a 20
MPX, then more power to you, but test it.

 

Third, although the standard advice is to "just trust NetBackup" and let
it leave things in queue if it needs to (there are a variety of legit
reasons it might be doing that, like jobs per policy or number of
streams available across all drives), I've found that trust bpsched not
to have a mental breakdown when trying to enqueue that many jobs at the
same time is not really a great plan. Spreading your start times out a
bit, so that bpsched can make its way through initiating all the
streams, is my preferred method. (In your case, you'd probably want to
kick jobs off in batches of 100 clients every twenty minutes or so
starting at 17:00, modulo special-case clients. You don't really have to
care too much about balancing volume of data between those policies,
provided they're all going into the same pool with the same retention
daily.) If letting NBU take care of it is working for you, great. (No,
staggering won't help the things you describe, though it also won't
hinder them, but it'll keep the memory usage on the scheduler sane and
there have definitely been scaling bugs with bpsched in the past... I've
forgotten at precisely which 5.1 MP, but it was not a lot of fun when it
happened to three different 2000-client / 8 media server environments I
cared about at the time.)

--
gabriel rosenkoetter
Radian Group Inc, Unix/Linux/VMware Sysadmin / Backup & Recovery
gabriel.rosenkoetter AT radian DOT biz, 215 231 1556 

 

 

________________________________

From: Cruice, Daniel (US - Glen Mills) [mailto:dcruice AT deloitte DOT com] 
Sent: Thursday, November 01, 2007 5:14 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] General questions for everyone

Say you have over 900 clients to backup from 5:00pm - 8:00am...20 LTO3
tapes drives in a library.  99% of the environment is Windows including
my media servers / Master node and I am running multiplexing (20) in
some cases.  Right now 90% of all my jobs kick off at 5:00 on the dot.
Seems that many of my jobs when they kick off will sit in a Queued
status for 15 - 20 minutes at the kick-off, the active jobs will
increment every few seconds.  I understand I'll have jobs queued once
the multiplexing hits the threshold for number of jobs per tape, or if
all my tape drives are being used.  But was just wondering if I
staggered my start time would help load up the tapes / writing to tape
any quicker, or simply go to an active state sooner?  But unfortunately
I am running on a 100mb network, but it is segregated from my production
network.

Suggestions?

Thanks

Dan Cruice

 

This message (including any attachments) contains confidential
information intended for a specific individual and purpose, and is
protected by law. If you are not the intended recipient, you should
delete this message and are hereby notified that any disclosure,
copying, or distribution of this message, or the taking of any action
based on it, is strictly prohibited. 



_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

<Prev in Thread] Current Thread [Next in Thread>