Veritas-bu

[Veritas-bu] Re: start notify and multiple data streams

2001-06-21 15:51:39
Subject: [Veritas-bu] Re: start notify and multiple data streams
From: John_Wang AT enron DOT net (John_Wang AT enron DOT net)
Date: Thu, 21 Jun 2001 14:51:39 -0500
Hello Curtis

If they were truly critical, they would be worth the added cost of a decent
filesystem.

A true hot backup state leaves the core of the database tables in a consistent
state and journals continuing activity, allowing ongoing database activity with
a modest performance hit that's probably more manageable than a mad scramble to
back everything up.   I believe you are thinking of leaving the database in a
quiescent state where the database is actually shut down.   Hey, I wasn't the
one to start using the term "hot backup" in this thread and have no idea if
there really is a "hot backup" mode in any known databases.   I suspect that the
term "hot backup" is a misnomer from the original poster.

A decent filesystem provides that ability without application developers having
to design it in.   Arguably a decent database would also have it built in and
would simply implement it as part of their rollback algorithm by simply allowing
for a suspension in the journal or log consolidation to consistent states,   I
know that Techra had the ability to rollback to the last known consistent state
and hence was actually journalling everything anyway.   Such functionality may
be found for free if you're willing to bone up on being a DBA from a system
administrator's perspective. Tell your employer to have you trained as a DBA or
buy the filesystem so that you can meet that responsibility ubiquitously.   My
advocation of a systematic filesystem solution is obvious proof that I am
acknowledging that database servers cannot be quiescent indefinitely.   I simply
believe that run optimization by multi-streaming can ever produce truly
acceptable and consistent results and therefore a systematic approach is
required.

Even a mad scramble to minimize your backup window results in a variable
quiescent downtime requirement and with Netbackup results in a variable time at
which this window starts.   Critical servers tend to require that the quiescent
periods are agreed upon beforehand and be of a regular period of time as well as
being a minimum.   If the servers are truly critical, saying that it'll be shut
down for an unspecified but minimal amount of time sometime during the backup
window would not be good enough.    Ultimately, you need a decent filesystem.
Purchasing a filesystem product is a capital expenditure, over time the cost of
capex amortizes to far less than the value of time spent by employees at water
coolers.   It's good business practice to consider capex cheap.

Multiple streams only allows performance gains by parallelism, you may be able
to get some more out of a multiple processor computer and a RAID but if it's a
single cpu server, than any performance gain would simply be due to poor tuning
to begin with.    Look I used to be Senior Support for TMC, I know parallelism
and TMC owns the patent on RAID and the SDA is still the only RAID that supports
an indefinite number of spindles with only one parity drive and is still the
only one where parity calculation costs nothing on the write.

Really, it is irrelevant if you agree with me.   From the perspective of career
advancement and business practices, one should tell management that you need DBA
training to support consistent backups with DBA features or a commercial
filesystem to provide that support from the system side.   From an ego centric
public domain hacker viewpoint, you'll whip up a bunch of scripts and spend
years supporting and arguing why yours is marginally faster than someone else's.
Sometimes it makes business sense to go with the latter if the staffing that you
have are fairly junior and underpaid but even then only for interim periods
because capex is always cheaper than operating costs and IT staff rarely remains
junior and underpaid for long.

I've found that most creative automation solutions were only to solve a problem
that existed due to misconfiguration in the first place.    How many sites used
to have periodic NIS push scripts and warn users of password changes that would
only be implemented periodically overnight when all they needed to do was
correct the permission's and pathnames of the NIS source and database files...
Why even one of the stated purposes of developing NIS+ was to address that issue
though it was only an issue of poor education/documentation.

Perhaps one of the reasons why we are differing on viewpoints here is just user
demands.   My user is currently demanding the reduction of the quiescent window
to the scale of a few seconds so from my perspective, filesystem freezes of some
kind are the only way to go.

Regards,
John I Wang
Sr. Systems Engineer
Steverson Information Professionals

P.S.
I get very few calls now (ie.: I've received one off duty page in three months)
but I used to be paged on a daily basis back when I was young and concentrated
on irrelevant tasks.

---
Enron Networks
Enron Building room 3427e
ph (713) 345-4888
cell (832) 493-1263
fax (713) 646-8462
pg pagejwang AT skytel DOT com or 1-877-390-4155





|--------+------------------------>
|        |          curtis@backupc|
|        |          entral.com    |
|        |                        |
|        |          06/21/01 12:21|
|        |          PM            |
|        |                        |
|--------+------------------------>
  >--------------------------------------------------------------------|
  |                                                                    |
  |       To:     John Wang/Contractor/Enron Communications@Enron      |
  |       Communications                                               |
  |       cc:     morms AT es DOT com, veritas-bu AT mailman.eng.auburn DOT edu 
     |
  |       Subject:     RE: [Veritas-bu] Re: start notify and multiple  |
  |       data streams                                                 |
  >--------------------------------------------------------------------|



Some thoughts below..

At 11:53 AM 6/21/2001 -0500, John_Wang AT enron DOT net wrote:


>Hello Curtis
>
>That may be irrelevant, if the other streams are still in the queue when
>all the
>streams that are running finishes, switching the database out of hot mode till
>the other streams come out of the queue may be acceptable depending on how the
>streams were configured.   Ultimately, to minimize time in the "hot" mode  and
>overhead, some sort of filesystem freeze should be used as is available from
>Veritas Filesystem or Sun's old Backup Copilot product.

It's definitely not irrelevant when you get calls from the DBAs complaining
that you shutdown and restarted their database five times last night.  (Or
put into and took out of hot backup mode.)  And yes, you can use vxfs
snapshot or EMC BCVs to take downtime to a minimum, but that costs a LOT
more per box.


>Besides multi streaming an individual server is a pointless exercise if your
>site is large enough to have enough simultaneous sessions from different
>servers
>to begin with.   It's the old parallel processing performance problem, too
>fine
>a granularity will not actually amortize the overhead required.   If you have
>more database servers to backup than the number of streams that you are
>willing
>to allow to your storage units concurrently, there's no point in
>multi-streaming.

I definitely disagree here.  You assume that all of the database servers
can be kept in backup mode indefinitely.  Doing things the way you suggest
might result in the same amount of overall backup time, but it will
definitely result in increased time per server.  I try to design things in
such a way that each server is backed up as quick as possible.

>Also, if the database is distributed across a RAID, the RAID
>hardware will hopefully already have squeezed out whatever performance
>gain that
>could be had by parallelism making multiple streams a moot point, if the
>database is all on one volume, multi-streaming will slow it down and with
>modern
>disks going towards monolithic drives with very few platters, any kind of
>parallelism will likely slow it down as more data is now on the same physical
>device.

Again, I definitely disagree.  I've used many very nice pieces of RAID
hardware, and your assumption that one stream should be the same speed as
many streams is definitely not the case.  In my latest configuration, we
were pumping 200 MB/s out of one RAID array using multistreaming.  Try
doing that with a single stream.

>The only thing I use multi-streaming for is when I have an unreliable link.
>That way, the backup is broken up into smaller backups that would not
>invalidate
>each other when a problem does occur ie.: if the link temporarily goes
>out, only
>a small piece needs to get requeued rather than the whole thing.   In this
>case,
>I would also constrain the class to a limited number of concurrent jobs in
>order
>to limit the number of segments disrupted by a network blip.   Other than for
>that use, I consider multi-streaming to be just plain evil.   The only
>problem I
>have there is that I have to increase the period of time, jobs are allowed to
>remain in the queue for and that can only be done globally rather than on
>a per
>class basis, wish Veritas would change that...
>
>Now, arguably, there may be cases where a specific server is so critical that
>they want to be able to dedicate all resources to that server during the
>backup
>period.   At that point, my view is that they need a more robust
>filesystem like
>Veritas Filesystem and perhaps a local tape changer and drives if the need
>for a
>short backup window is so critical.   Playing with multi-streaming will just
>accommodate that server at the expense of overall capacity.

But what if you have a data center full of critical servers, each of which
wants their backup to be completed as quick as possible?  The best way to
do it is multistreaming.


---
W. Curtis Preston
Principal Consultant for Storage Designs, your storage experts
Webmaster: http://www.backupcentral.com Phone: 760 631 7991