ADSM-L

Re: [ADSM-L] protect pool plus replicate node equals poor replication efficiencies

2016-09-16 09:35:18
Subject: Re: [ADSM-L] protect pool plus replicate node equals poor replication efficiencies
From: "Ryder, Michael S" <michael_s.ryder AT ROCHE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 16 Sep 2016 09:30:44 -0400
Have you read this?  IBM Spectrum Protect Node Replication
<http://www.empalis.com/fileadmin/templates_empalis/PDFs/Events_2015/TSM_Symp_2015_Nachlese/02_TSM_Symp_2015_Nachlese_Node_Replication.pdf>
from the 2015 TSM Symposium

Take a look at the bottom slide on page 17, there are pointers about
session count, lock contention and other best practices.  The entire
slide-deck is worth reading.

Best regards,

Mike, x7942
RMD IT Client Services

On Thu, Sep 15, 2016 at 4:07 PM, Nixon, Charles D. (David) <
cdnixon AT carilionclinic DOT org> wrote:

> And since replnode takes precedence over a backup (it has a habit of
> killing our TDP sessions), the protect pool can be used to get data offsite
> more frequently with out impacting clients.  The the replnode should be an
> extremely fast process that is thus less likely to impact client sessions.
>
> If I find out anything, I'll be sure to share but it will likely be a
> couple weeks due to travel, coordination, etc.
>
> ---------------------------------------------------
> David Nixon
> Storage Engineer II
> Technology Services Group
> Carilion Clinic
> 451 Kimball Ave.
> Roanoke, VA 24015
> Phone: 540-224-3903
> cdnixon AT carilionclinic DOT org
>
> Our mission: Improve the health of the communities we serve.
>
>
>
> ________________________________________
> From: ADSM: Dist Stor Manager [ADSM-L AT VM.MARIST DOT EDU] on behalf of 
> Stefan
> Folkerts [stefan.folkerts AT GMAIL DOT COM]
> Sent: Thursday, September 15, 2016 3:17 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] protect pool plus replicate node equals poor
> replication efficiencies
>
> There is a relatively new command called "protect stgpool" that does the
> "replication" of the data part that replication node used to do.
> You only do a replicate node after that to replicate the meta data.
> So you basically split the replicatation of data and meta data into two
> processes and the overal time it takes is shorter because protect stgpool
> is faster in transferring data than the replicate node command was (and is,
> you can still use that for both data and meta data).
>
> On Thursday, 15 September 2016, Ryder, Michael S <
> michael_s.ryder AT roche DOT com>
> wrote:
>
> > The protected storage pool can be on a different server?
> >
> > On Thursday, September 15, 2016, Stefan Folkerts <
> > stefan.folkerts AT gmail DOT com <javascript:;>>
> > wrote:
> >
> > > Not if you run protect stgpool before you run replnode.
> > > Than the protect stgpool will send the data and the replnode will
> > transmit
> > > only metadata of the nodes in that storagepool.
> > > If you replicate nodes that have data in other storagepools yes, than
> it
> > > will replicate that data.
> > > Replicating metadata also puts data on the line of course but it's not
> > > backup data, it's backup metadata.
> > >
> > > On Thu, Sep 15, 2016 at 6:50 PM, Ryder, Michael S <
> > > michael_s.ryder AT roche DOT com <javascript:;> <javascript:;>
> > > > wrote:
> > >
> > > > If you replnode from one server to another... it *has* to send the
> data
> > > > that changed, no?
> > > >
> > > > Best regards,
> > > >
> > > > Mike <http://rbbuswiki.bbg.roche.com/wiki/ryderm_page:start>, x7942
> > > > RMD IT Client Services
> > > > <http://na.intranet.roche.com/sites/RMD/content/Departments/
> > > > IT/Pages/default.aspx>
> > > >
> > > > On Thu, Sep 15, 2016 at 12:41 PM, Stefan Folkerts <
> > > > stefan.folkerts AT gmail DOT com <javascript:;> <javascript:;>
> > > > > wrote:
> > > >
> > > > > >Do I have this right so far?
> > > > >
> > > > > No, I think he is under the impression the data is send twice, it
> > looks
> > > > > that way a little the way Spectrum Protect reports on the
> replication
> > > > > proces, but it's representing the data..not actually sending it, it
> > is
> > > > only
> > > > > sending metadata of that data.
> > > > >
> > > > >
> > > > > On Thu, Sep 15, 2016 at 6:24 PM, Ryder, Michael S <
> > > > > michael_s.ryder AT roche DOT com <javascript:;> <javascript:;>
> > > > > > wrote:
> > > > >
> > > > > > Something doesn't make sense.
> > > > > >
> > > > > > You run a backup - node's data is stored in a pool on server A.
> > > > > >
> > > > > > Then, you protect pool, and a copy of the de-duped data is sent
> to
> > > the
> > > > > > protect pool storage, also on server A.
> > > > > >
> > > > > > Then, you replnode, and a node is replicated to server B.  You
> are
> > > > > > surprised to find that data is being sent from server A to server
> > B.
> > > > > >
> > > > > > Do I have this right so far?
> > > > > >
> > > > > > Until you issue the replnode command, how is server B supposed to
> > > know
> > > > > > about the data in server A's storage pools?
> > > > > >
> > > > > > Don't you still need to copy the data at least once from server A
> > to
> > > > > server
> > > > > > B?  Isn't this normal?
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Mike, x7942
> > > > > > RMD IT Client Services
> > > > > >
> > > > > > On Thu, Sep 15, 2016 at 12:02 PM, Stefan Folkerts <
> > > > > > stefan.folkerts AT gmail DOT com <javascript:;> <javascript:;>
> > > > > > > wrote:
> > > > > >
> > > > > > > Do you have a fast Spectrum Protect database / active log?
> > > > > > > We run 2.4TB of metadata per hour with replication (note, this
> is
> > > not
> > > > > > > actual data, this is metadata representing 2.4TB of data).
> > > > > > > But that system has SSD's and runs in excess of 140.000 IOP/s
> in
> > > > > Spectrum
> > > > > > > Protect database benchmarks.
> > > > > > > I would think this is very much database (and active log)
> > > performance
> > > > > > bound
> > > > > > > (on both sides).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Sep 15, 2016 at 5:50 PM, Nixon, Charles D. (David) <
> > > > > > > cdnixon AT carilionclinic DOT org <javascript:;> <javascript:;>>
> wrote:
> > > > > > >
> > > > > > > > Best I can tell, it is transferring the data over the wire
> and
> > > > > support
> > > > > > > > stated as much.  We are currently using the replnode for that
> > > > single
> > > > > > node
> > > > > > > > so it's getting the default of 10 sessions and appears to be
> > > using
> > > > > all
> > > > > > of
> > > > > > > > them, for the four hours or so that it's sending data.
> > > > > > > >
> > > > > > > > I don't have a good way to see server's bandwidth but network
> > IO
> > > > > chart
> > > > > > > > implies that it's not sending a great amount of data but that
> > may
> > > > be
> > > > > > due
> > > > > > > to
> > > > > > > > the 846GB over 4.5 hours.
> > > > > > > >
> > > > > > > > 09/15/16   10:58:08      ANR0327I Replication of node
> NODENAME
> > > > > > completed.
> > > > > > > > Files
> > > > > > > >                           current: 70,341. Files replicated:
> > 752
> > > of
> > > > > > 752.
> > > > > > > > Files
> > > > > > > >                           updated: 602 of 602. Files deleted:
> > 692
> > > > of
> > > > > > 692.
> > > > > > > > Amount
> > > > > > > >                           replicated: 12,487 GB of 12,487 GB.
> > > > Amount
> > > > > > > > transferred:
> > > > > > > >                           846 GB. Elapsed time: 0 Days, 4
> > Hours,
> > > 28
> > > > > > > > Minutes.
> > > > > > > >                           (SESSION: 414242, PROCESS: 539)
> > > > > > > > ---------------------------------------------------
> > > > > > > > David Nixon
> > > > > > > > Storage Engineer II
> > > > > > > > Technology Services Group
> > > > > > > > Carilion Clinic
> > > > > > > > 451 Kimball Ave.
> > > > > > > > Roanoke, VA 24015
> > > > > > > > Phone: 540-224-3903
> > > > > > > > cdnixon AT carilionclinic DOT org <javascript:;> <javascript:;>
> > > > > > > >
> > > > > > > > Our mission: Improve the health of the communities we serve.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ________________________________________
> > > > > > > > From: ADSM: Dist Stor Manager [ADSM-L AT VM.MARIST DOT EDU
> > <javascript:;>
> > > <javascript:;>] on behalf of
> > > > > > Stefan
> > > > > > > > Folkerts [stefan.folkerts AT GMAIL DOT COM <javascript:;>
> > <javascript:;>]
> > > > > > > > Sent: Thursday, September 15, 2016 10:45 AM
> > > > > > > > To: ADSM-L AT VM.MARIST DOT EDU <javascript:;> <javascript:;>
> > > > > > > > Subject: Re: [ADSM-L] protect pool plus replicate node equals
> > > poor
> > > > > > > > replication efficiencies
> > > > > > > >
> > > > > > > > >Support confirmed that the amount of data replicated in a
> > > replnode
> > > > > > > command
> > > > > > > > is the same, regardless of the protect pool command status.
> > > > > > > >
> > > > > > > > I think that this is only in the statistics, not in the
> actual
> > > > > transfer
> > > > > > > on
> > > > > > > > the wire.
> > > > > > > > the replnode should not transmit actual data if the data was
> > send
> > > > by
> > > > > > the
> > > > > > > > protect storagepool command.
> > > > > > > > Are you running enough (but not to many) parallel processes
> for
> > > the
> > > > > > > > replicate node command so it can perform optimally?
> > > > > > > >
> > > > > > > > I'm using this setup for multiple customers and it worked
> fine
> > > for
> > > > us
> > > > > > so
> > > > > > > > far.
> > > > > > > >
> > > > > > > > http://imgur.com/a/mT6ux
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Sep 15, 2016 at 4:29 PM, Nixon, Charles D. (David) <
> > > > > > > > cdnixon AT carilionclinic DOT org <javascript:;> <javascript:;>>
> > wrote:
> > > > > > > >
> > > > > > > > > We opened a ticket related to long replication times in a
> > > > container
> > > > > > > pool
> > > > > > > > > after replication takes place, and got an answer that 'we
> can
> > > > > > recreate
> > > > > > > > your
> > > > > > > > > problem but it is likely working as designed' even though
> > it's
> > > > > > contrary
> > > > > > > > to
> > > > > > > > > documentation.  Any ideas would be appreciated.
> > > > > > > > >
> > > > > > > > > -Two TSM servers at 7.1.5
> > > > > > > > > -Single client going to a single container.  Client backs
> up
> > > > 12TB a
> > > > > > > night
> > > > > > > > > and after dedupe/compression, we see a 1TB change rate
> > > > > > (approximately).
> > > > > > > > > -Once the backup is complete, we run a protect pool.  It's
> > > > expected
> > > > > > > that
> > > > > > > > > this process will ship 1TB to the DR site.  -Protect
> > completes
> > > > > > > > successfully.
> > > > > > > > > -a replnode is issued against the node and TSM spends the
> > next
> > > 4
> > > > > > hours
> > > > > > > > > replicating data to the DR site
> > > > > > > > >
> > > > > > > > > Support confirmed that the amount of data replicated in a
> > > > replnode
> > > > > > > > command
> > > > > > > > > is the same, regardless of the protect pool command status.
> > > > > However,
> > > > > > > the
> > > > > > > > > documentation leads me to be that if you have already
> > protected
> > > > the
> > > > > > > pool,
> > > > > > > > > the replnode should be a metadata only transfer.
> > > > > > > > >
> > > > > > > > > So while we are able to transfer and complete the
> processes,
> > it
> > > > > seems
> > > > > > > to
> > > > > > > > > 'cost' us quite a bit in both IO and WAN usage to do so
> using
> > > > > > > containers,
> > > > > > > > > defeating the point of using containers to reduce
> replication
> > > > > costs.
> > > > > > > Any
> > > > > > > > > ideas as to what is going on?
> > > > > > > > >
> > > > > > > > > ---------------------------------------------------
> > > > > > > > > David Nixon
> > > > > > > > > Storage Engineer II
> > > > > > > > > Technology Services Group
> > > > > > > > > Carilion Clinic
> > > > > > > > > 451 Kimball Ave.
> > > > > > > > > Roanoke, VA 24015
> > > > > > > > > Phone: 540-224-3903
> > > > > > > > > cdnixon AT carilionclinic DOT org <javascript:;> 
> > > > > > > > > <javascript:;>
> > > > > > > > >
> > > > > > > > > Our mission: Improve the health of the communities we
> serve.
> > > > > > > > >
> > > > > > > > > ________________________________
> > > > > > > > >
> > > > > > > > > Notice: The information and attachment(s) contained in this
> > > > > > > communication
> > > > > > > > > are intended for the addressee only, and may be
> confidential
> > > > and/or
> > > > > > > > legally
> > > > > > > > > privileged. If you have received this communication in
> error,
> > > > > please
> > > > > > > > > contact the sender immediately, and delete this
> communication
> > > > from
> > > > > > any
> > > > > > > > > computer or network system. Any interception, review,
> > printing,
> > > > > > > copying,
> > > > > > > > > re-transmission, dissemination, or other use of, or taking
> of
> > > any
> > > > > > > action
> > > > > > > > > upon this information by persons or entities other than the
> > > > > intended
> > > > > > > > > recipient is strictly prohibited by law and may subject
> them
> > to
> > > > > > > criminal
> > > > > > > > or
> > > > > > > > > civil liability. Carilion Clinic shall not be liable for
> the
> > > > > improper
> > > > > > > > > and/or incomplete transmission of the information contained
> > in
> > > > this
> > > > > > > > > communication or for any delay in its receipt.
> > > > > > > > >
> > > > > > > >
> > > > > > > > ________________________________
> > > > > > > >
> > > > > > > > Notice: The information and attachment(s) contained in this
> > > > > > communication
> > > > > > > > are intended for the addressee only, and may be confidential
> > > and/or
> > > > > > > legally
> > > > > > > > privileged. If you have received this communication in error,
> > > > please
> > > > > > > > contact the sender immediately, and delete this communication
> > > from
> > > > > any
> > > > > > > > computer or network system. Any interception, review,
> printing,
> > > > > > copying,
> > > > > > > > re-transmission, dissemination, or other use of, or taking of
> > any
> > > > > > action
> > > > > > > > upon this information by persons or entities other than the
> > > > intended
> > > > > > > > recipient is strictly prohibited by law and may subject them
> to
> > > > > > criminal
> > > > > > > or
> > > > > > > > civil liability. Carilion Clinic shall not be liable for the
> > > > improper
> > > > > > > > and/or incomplete transmission of the information contained
> in
> > > this
> > > > > > > > communication or for any delay in its receipt.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Best regards,
> >
> > Mike <http://rbbuswiki.bbg.roche.com/wiki/ryderm_page:start>, x7942
> > RMD IT Client Services
> > <http://na.intranet.roche.com/sites/RMD/content/Departments/
> > IT/Pages/default.aspx>
> >
>
> ________________________________
>
> Notice: The information and attachment(s) contained in this communication
> are intended for the addressee only, and may be confidential and/or legally
> privileged. If you have received this communication in error, please
> contact the sender immediately, and delete this communication from any
> computer or network system. Any interception, review, printing, copying,
> re-transmission, dissemination, or other use of, or taking of any action
> upon this information by persons or entities other than the intended
> recipient is strictly prohibited by law and may subject them to criminal or
> civil liability. Carilion Clinic shall not be liable for the improper
> and/or incomplete transmission of the information contained in this
> communication or for any delay in its receipt.
>