ADSM-L

Re: [ADSM-L] protect pool plus replicate node equals poor replication efficiencies

2016-09-15 15:47:36
Subject: Re: [ADSM-L] protect pool plus replicate node equals poor replication efficiencies
From: "Nixon, Charles D. (David)" <cdnixon AT CARILIONCLINIC DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 15 Sep 2016 19:45:06 +0000
Our TSM servers exceed the LARGE size in the blueprint for AIX and both DB and 
active logs are on SAN attached flash storage.  Support confirmed that the DB 
and logs were not a performance limitation (a good thing since we went through 
the process of validating disk performance before we went live).

Mike,
We are protecting/replicating from server A pool A to server B pool B across a 
10Gb WAN circuit.  Since pools can fix their data errors from the target pool, 
having a second copy locally is less of an concern for us.

Stefan,
I was under the original impression that it only sends the data once.  However, 
given the time it takes us to replicate what should be metadata, the reporting 
TSM shows, and support's observations, all lead me to believe that TSM may be 
trying to replicate the data twice.

I may have to get with our network team to try and figure out how much 
throughput is being sent during the replnode since we can't get good numbers 
from AIX (it's showing network IO but not throughput) and will have to disable 
the other replication/protection jobs so that we can isolate the traffic.  My 
reason for bringing it up was to see if anyone else had run into this type of 
problem since I doubt that I'll be able to speak to a knowledgeable resource at 
Edge next week.

---------------------------------------------------
David Nixon
Storage Engineer II
Technology Services Group
Carilion Clinic
451 Kimball Ave.
Roanoke, VA 24015
Phone: 540-224-3903
cdnixon AT carilionclinic DOT org

Our mission: Improve the health of the communities we serve.



________________________________________
From: ADSM: Dist Stor Manager [ADSM-L AT VM.MARIST DOT EDU] on behalf of Stefan 
Folkerts [stefan.folkerts AT GMAIL DOT COM]
Sent: Thursday, September 15, 2016 12:02 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] protect pool plus replicate node equals poor replication 
efficiencies

Do you have a fast Spectrum Protect database / active log?
We run 2.4TB of metadata per hour with replication (note, this is not
actual data, this is metadata representing 2.4TB of data).
But that system has SSD's and runs in excess of 140.000 IOP/s in Spectrum
Protect database benchmarks.
I would think this is very much database (and active log) performance bound
(on both sides).




On Thu, Sep 15, 2016 at 5:50 PM, Nixon, Charles D. (David) <
cdnixon AT carilionclinic DOT org> wrote:

> Best I can tell, it is transferring the data over the wire and support
> stated as much.  We are currently using the replnode for that single node
> so it's getting the default of 10 sessions and appears to be using all of
> them, for the four hours or so that it's sending data.
>
> I don't have a good way to see server's bandwidth but network IO chart
> implies that it's not sending a great amount of data but that may be due to
> the 846GB over 4.5 hours.
>
> 09/15/16   10:58:08      ANR0327I Replication of node NODENAME completed.
> Files
>                           current: 70,341. Files replicated: 752 of 752.
> Files
>                           updated: 602 of 602. Files deleted: 692 of 692.
> Amount
>                           replicated: 12,487 GB of 12,487 GB. Amount
> transferred:
>                           846 GB. Elapsed time: 0 Days, 4 Hours, 28
> Minutes.
>                           (SESSION: 414242, PROCESS: 539)
> ---------------------------------------------------
> David Nixon
> Storage Engineer II
> Technology Services Group
> Carilion Clinic
> 451 Kimball Ave.
> Roanoke, VA 24015
> Phone: 540-224-3903
> cdnixon AT carilionclinic DOT org
>
> Our mission: Improve the health of the communities we serve.
>
>
>
> ________________________________________
> From: ADSM: Dist Stor Manager [ADSM-L AT VM.MARIST DOT EDU] on behalf of 
> Stefan
> Folkerts [stefan.folkerts AT GMAIL DOT COM]
> Sent: Thursday, September 15, 2016 10:45 AM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] protect pool plus replicate node equals poor
> replication efficiencies
>
> >Support confirmed that the amount of data replicated in a replnode command
> is the same, regardless of the protect pool command status.
>
> I think that this is only in the statistics, not in the actual transfer on
> the wire.
> the replnode should not transmit actual data if the data was send by the
> protect storagepool command.
> Are you running enough (but not to many) parallel processes for the
> replicate node command so it can perform optimally?
>
> I'm using this setup for multiple customers and it worked fine for us so
> far.
>
> http://imgur.com/a/mT6ux
>
>
>
>
> On Thu, Sep 15, 2016 at 4:29 PM, Nixon, Charles D. (David) <
> cdnixon AT carilionclinic DOT org> wrote:
>
> > We opened a ticket related to long replication times in a container pool
> > after replication takes place, and got an answer that 'we can recreate
> your
> > problem but it is likely working as designed' even though it's contrary
> to
> > documentation.  Any ideas would be appreciated.
> >
> > -Two TSM servers at 7.1.5
> > -Single client going to a single container.  Client backs up 12TB a night
> > and after dedupe/compression, we see a 1TB change rate (approximately).
> > -Once the backup is complete, we run a protect pool.  It's expected that
> > this process will ship 1TB to the DR site.  -Protect completes
> successfully.
> > -a replnode is issued against the node and TSM spends the next 4 hours
> > replicating data to the DR site
> >
> > Support confirmed that the amount of data replicated in a replnode
> command
> > is the same, regardless of the protect pool command status.  However, the
> > documentation leads me to be that if you have already protected the pool,
> > the replnode should be a metadata only transfer.
> >
> > So while we are able to transfer and complete the processes, it seems to
> > 'cost' us quite a bit in both IO and WAN usage to do so using containers,
> > defeating the point of using containers to reduce replication costs.  Any
> > ideas as to what is going on?
> >
> > ---------------------------------------------------
> > David Nixon
> > Storage Engineer II
> > Technology Services Group
> > Carilion Clinic
> > 451 Kimball Ave.
> > Roanoke, VA 24015
> > Phone: 540-224-3903
> > cdnixon AT carilionclinic DOT org
> >
> > Our mission: Improve the health of the communities we serve.
> >
> > ________________________________
> >
> > Notice: The information and attachment(s) contained in this communication
> > are intended for the addressee only, and may be confidential and/or
> legally
> > privileged. If you have received this communication in error, please
> > contact the sender immediately, and delete this communication from any
> > computer or network system. Any interception, review, printing, copying,
> > re-transmission, dissemination, or other use of, or taking of any action
> > upon this information by persons or entities other than the intended
> > recipient is strictly prohibited by law and may subject them to criminal
> or
> > civil liability. Carilion Clinic shall not be liable for the improper
> > and/or incomplete transmission of the information contained in this
> > communication or for any delay in its receipt.
> >
>
> ________________________________
>
> Notice: The information and attachment(s) contained in this communication
> are intended for the addressee only, and may be confidential and/or legally
> privileged. If you have received this communication in error, please
> contact the sender immediately, and delete this communication from any
> computer or network system. Any interception, review, printing, copying,
> re-transmission, dissemination, or other use of, or taking of any action
> upon this information by persons or entities other than the intended
> recipient is strictly prohibited by law and may subject them to criminal or
> civil liability. Carilion Clinic shall not be liable for the improper
> and/or incomplete transmission of the information contained in this
> communication or for any delay in its receipt.
>

________________________________

Notice: The information and attachment(s) contained in this communication are 
intended for the addressee only, and may be confidential and/or legally 
privileged. If you have received this communication in error, please contact 
the sender immediately, and delete this communication from any computer or 
network system. Any interception, review, printing, copying, re-transmission, 
dissemination, or other use of, or taking of any action upon this information 
by persons or entities other than the intended recipient is strictly prohibited 
by law and may subject them to criminal or civil liability. Carilion Clinic 
shall not be liable for the improper and/or incomplete transmission of the 
information contained in this communication or for any delay in its receipt.