ADSM-L

Re: [ADSM-L] protect pool plus replicate node equals poor replication efficiencies

2016-09-15 16:05:57
Subject: Re: [ADSM-L] protect pool plus replicate node equals poor replication efficiencies
From: Stefan Folkerts <stefan.folkerts AT GMAIL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 15 Sep 2016 22:02:44 +0200
Running nmon in batch mode and running that thru the nmon analyzer should
paint the picture I think, that tool gives pretty decent network usage data.

Anyway, good luck and let us know what the outcome is please. :-)

On Thursday, 15 September 2016, Nixon, Charles D. (David) <
cdnixon AT carilionclinic DOT org> wrote:

> Our TSM servers exceed the LARGE size in the blueprint for AIX and both DB
> and active logs are on SAN attached flash storage.  Support confirmed that
> the DB and logs were not a performance limitation (a good thing since we
> went through the process of validating disk performance before we went
> live).
>
> Mike,
> We are protecting/replicating from server A pool A to server B pool B
> across a 10Gb WAN circuit.  Since pools can fix their data errors from the
> target pool, having a second copy locally is less of an concern for us.
>
> Stefan,
> I was under the original impression that it only sends the data once.
> However, given the time it takes us to replicate what should be metadata,
> the reporting TSM shows, and support's observations, all lead me to believe
> that TSM may be trying to replicate the data twice.
>
> I may have to get with our network team to try and figure out how much
> throughput is being sent during the replnode since we can't get good
> numbers from AIX (it's showing network IO but not throughput) and will have
> to disable the other replication/protection jobs so that we can isolate the
> traffic.  My reason for bringing it up was to see if anyone else had run
> into this type of problem since I doubt that I'll be able to speak to a
> knowledgeable resource at Edge next week.
>
> ---------------------------------------------------
> David Nixon
> Storage Engineer II
> Technology Services Group
> Carilion Clinic
> 451 Kimball Ave.
> Roanoke, VA 24015
> Phone: 540-224-3903
> cdnixon AT carilionclinic DOT org <javascript:;>
>
> Our mission: Improve the health of the communities we serve.
>
>
>
> ________________________________________
> From: ADSM: Dist Stor Manager [ADSM-L AT VM.MARIST DOT EDU <javascript:;>] on
> behalf of Stefan Folkerts [stefan.folkerts AT GMAIL DOT COM <javascript:;>]
> Sent: Thursday, September 15, 2016 12:02 PM
> To: ADSM-L AT VM.MARIST DOT EDU <javascript:;>
> Subject: Re: [ADSM-L] protect pool plus replicate node equals poor
> replication efficiencies
>
> Do you have a fast Spectrum Protect database / active log?
> We run 2.4TB of metadata per hour with replication (note, this is not
> actual data, this is metadata representing 2.4TB of data).
> But that system has SSD's and runs in excess of 140.000 IOP/s in Spectrum
> Protect database benchmarks.
> I would think this is very much database (and active log) performance bound
> (on both sides).
>
>
>
>
> On Thu, Sep 15, 2016 at 5:50 PM, Nixon, Charles D. (David) <
> cdnixon AT carilionclinic DOT org <javascript:;>> wrote:
>
> > Best I can tell, it is transferring the data over the wire and support
> > stated as much.  We are currently using the replnode for that single node
> > so it's getting the default of 10 sessions and appears to be using all of
> > them, for the four hours or so that it's sending data.
> >
> > I don't have a good way to see server's bandwidth but network IO chart
> > implies that it's not sending a great amount of data but that may be due
> to
> > the 846GB over 4.5 hours.
> >
> > 09/15/16   10:58:08      ANR0327I Replication of node NODENAME completed.
> > Files
> >                           current: 70,341. Files replicated: 752 of 752.
> > Files
> >                           updated: 602 of 602. Files deleted: 692 of 692.
> > Amount
> >                           replicated: 12,487 GB of 12,487 GB. Amount
> > transferred:
> >                           846 GB. Elapsed time: 0 Days, 4 Hours, 28
> > Minutes.
> >                           (SESSION: 414242, PROCESS: 539)
> > ---------------------------------------------------
> > David Nixon
> > Storage Engineer II
> > Technology Services Group
> > Carilion Clinic
> > 451 Kimball Ave.
> > Roanoke, VA 24015
> > Phone: 540-224-3903
> > cdnixon AT carilionclinic DOT org <javascript:;>
> >
> > Our mission: Improve the health of the communities we serve.
> >
> >
> >
> > ________________________________________
> > From: ADSM: Dist Stor Manager [ADSM-L AT VM.MARIST DOT EDU <javascript:;>] 
> > on
> behalf of Stefan
> > Folkerts [stefan.folkerts AT GMAIL DOT COM <javascript:;>]
> > Sent: Thursday, September 15, 2016 10:45 AM
> > To: ADSM-L AT VM.MARIST DOT EDU <javascript:;>
> > Subject: Re: [ADSM-L] protect pool plus replicate node equals poor
> > replication efficiencies
> >
> > >Support confirmed that the amount of data replicated in a replnode
> command
> > is the same, regardless of the protect pool command status.
> >
> > I think that this is only in the statistics, not in the actual transfer
> on
> > the wire.
> > the replnode should not transmit actual data if the data was send by the
> > protect storagepool command.
> > Are you running enough (but not to many) parallel processes for the
> > replicate node command so it can perform optimally?
> >
> > I'm using this setup for multiple customers and it worked fine for us so
> > far.
> >
> > http://imgur.com/a/mT6ux
> >
> >
> >
> >
> > On Thu, Sep 15, 2016 at 4:29 PM, Nixon, Charles D. (David) <
> > cdnixon AT carilionclinic DOT org <javascript:;>> wrote:
> >
> > > We opened a ticket related to long replication times in a container
> pool
> > > after replication takes place, and got an answer that 'we can recreate
> > your
> > > problem but it is likely working as designed' even though it's contrary
> > to
> > > documentation.  Any ideas would be appreciated.
> > >
> > > -Two TSM servers at 7.1.5
> > > -Single client going to a single container.  Client backs up 12TB a
> night
> > > and after dedupe/compression, we see a 1TB change rate (approximately).
> > > -Once the backup is complete, we run a protect pool.  It's expected
> that
> > > this process will ship 1TB to the DR site.  -Protect completes
> > successfully.
> > > -a replnode is issued against the node and TSM spends the next 4 hours
> > > replicating data to the DR site
> > >
> > > Support confirmed that the amount of data replicated in a replnode
> > command
> > > is the same, regardless of the protect pool command status.  However,
> the
> > > documentation leads me to be that if you have already protected the
> pool,
> > > the replnode should be a metadata only transfer.
> > >
> > > So while we are able to transfer and complete the processes, it seems
> to
> > > 'cost' us quite a bit in both IO and WAN usage to do so using
> containers,
> > > defeating the point of using containers to reduce replication costs.
> Any
> > > ideas as to what is going on?
> > >
> > > ---------------------------------------------------
> > > David Nixon
> > > Storage Engineer II
> > > Technology Services Group
> > > Carilion Clinic
> > > 451 Kimball Ave.
> > > Roanoke, VA 24015
> > > Phone: 540-224-3903
> > > cdnixon AT carilionclinic DOT org <javascript:;>
> > >
> > > Our mission: Improve the health of the communities we serve.
> > >
> > > ________________________________
> > >
> > > Notice: The information and attachment(s) contained in this
> communication
> > > are intended for the addressee only, and may be confidential and/or
> > legally
> > > privileged. If you have received this communication in error, please
> > > contact the sender immediately, and delete this communication from any
> > > computer or network system. Any interception, review, printing,
> copying,
> > > re-transmission, dissemination, or other use of, or taking of any
> action
> > > upon this information by persons or entities other than the intended
> > > recipient is strictly prohibited by law and may subject them to
> criminal
> > or
> > > civil liability. Carilion Clinic shall not be liable for the improper
> > > and/or incomplete transmission of the information contained in this
> > > communication or for any delay in its receipt.
> > >
> >
> > ________________________________
> >
> > Notice: The information and attachment(s) contained in this communication
> > are intended for the addressee only, and may be confidential and/or
> legally
> > privileged. If you have received this communication in error, please
> > contact the sender immediately, and delete this communication from any
> > computer or network system. Any interception, review, printing, copying,
> > re-transmission, dissemination, or other use of, or taking of any action
> > upon this information by persons or entities other than the intended
> > recipient is strictly prohibited by law and may subject them to criminal
> or
> > civil liability. Carilion Clinic shall not be liable for the improper
> > and/or incomplete transmission of the information contained in this
> > communication or for any delay in its receipt.
> >
>
> ________________________________
>
> Notice: The information and attachment(s) contained in this communication
> are intended for the addressee only, and may be confidential and/or legally
> privileged. If you have received this communication in error, please
> contact the sender immediately, and delete this communication from any
> computer or network system. Any interception, review, printing, copying,
> re-transmission, dissemination, or other use of, or taking of any action
> upon this information by persons or entities other than the intended
> recipient is strictly prohibited by law and may subject them to criminal or
> civil liability. Carilion Clinic shall not be liable for the improper
> and/or incomplete transmission of the information contained in this
> communication or for any delay in its receipt.
>