ADSM-L

Re: [ADSM-L] LTO5 Tuning

2014-10-31 03:21:34
Subject: Re: [ADSM-L] LTO5 Tuning
From: Frank Fegert <fra.nospam.nk AT GMX DOT DE>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 31 Oct 2014 08:20:29 +0100
Hello Steve,

what is the actual distance on the path to the tape drives? Can you ask
your SAN team how many buffer credits are configured on the switch ports?
It seems from your statement "76 km distance" that it might be a long
distance connection. Make sure you have enought buffer credits on the
long distance links to keep the whole length of the wire "populated".
Also, if its a long distance connection, is this purely FC or is there
some kind of FC-to-whatever equipment inbetween which might attribute to
the high latency? On the fcs* devices what is your setting for the
"max_xfer_size" attribute?

Best regards,

    Frank Fegert


On Fri, Oct 31, 2014 at 12:24:53PM +1100, Steven Harris wrote:
> Thanks for your interest Mike.  The san guy tells me it's 380 microseconds
> which equates to 76 km distance.
>
> Regards
>
> Steve
> On 30 Oct 2014 23:50, "Ryder, Michael S" <michael_s.ryder AT roche DOT com> 
> wrote:
>
> > Steve:
> >
> > What is the latency on your intersite connection?
> >
> > Best regards,
> >
> > Mike, x7942
> > RMD IT Client Services
> >
> > On Thu, Oct 30, 2014 at 8:35 AM, Steven Harris <steve AT stevenharris DOT 
> > info>
> > wrote:
> >
> > > Hi All
> > >
> > > I have a TSM 6.3.4 server on newish P7 hardware and AIX V7.1. HBAs are
> > > all 8Gb.  The sans behind it are 8Gb or 4Gb depending which path they
> > > take as we are in the middle of a SAN upgrade and there is still an old
> > > switch in the mix.
> > >
> > > Disk is XIV behind SVC.  Tape is TS3500 and LTO5.
> > >
> > > According to the LTO wikipedia entry I should be able to get 140MB/sec
> > > raw out of the drive.  I have an internal company document that suggests
> > > sustained 210MB/sec (compressed) is attainable in the real world.
> > >
> > > So far my server backs up 500GB per night of DB2 and Oracle databases on
> > > to file pools, without deduplication.  Housekeeping then does a
> > > single-streamed simultaneous migrate and copy to onsite and offsite
> > > tapes.  Inter site bandwidth is 4Gb and I have most of that to myself.
> > >
> > > That process takes over 5 hours so I'm seeing less than 100MB/sec.
> > >
> > > Accordingly I started a tuning exercise.  I copied 50GB of my filepool
> > > twice to give me a test dataset and started testing, of course when
> > > there was no other activity on the TSM box.
> > >
> > > The data comes off disk at 500MB/sec to /dev/null, so that is not a
> > > bottleneck.
> > >
> > > Copying using dd to tape runs at a peak of 120MB/sec with periods of
> > > much lower than that, as measured using nmon's fc stats on the HBAs. I
> > > presume some of that slowdown is where the tape reaches its end and has
> > > to reverse direction.
> > >
> > > Elapsed time for 100GB is 18 min, with little variation so average speed
> > > is 95MB/sec
> > >
> > > dd ibs and obs values were varied and ibs=256K obs=1024K seems to give
> > > the best result.
> > >
> > > Elapsed time is very consistent.
> > >
> > > Copying to a local drive on the same switch blade as the tape HBA or
> > > copying across blades made no difference.
> > >
> > > Copying to a drive at the remote site increased elapsed time by 2
> > > minutes, as one would expect with more switches in the path and a longer
> > > turnaround time.
> > >
> > > Tape to tape copy was not noticeably different to disk to tape.
> > >
> > > Reading from tape to /dev/null was no different.
> > >
> > > In all cases CPU time was about half of the elapsed time.
> > >
> > > lsattr on the drives shows that compression is on (this is also the
> > > default)
> > >
> > > The tape FC adapters are set to use the large transfer size.
> > >
> > > The test was also run using 64KB pages and svmon was used to verify the
> > > setting was effective. Again no difference.
> > >
> > > I'm running out of ideas here.  num_cmd_elements on the hbas is 500 (the
> > > default)  I'm thinking of increasing that to 2000, but it will require
> > > an outage and hence change control.
> > >
> > > Does anyone have any ideas, references I could look at or practical
> > > advice as to how to get this to perform?
> > >
> > > Thanks
> > >
> > > Steve
> > >
> > > Steven Harris
> > > TSM Admin
> > > Canberra Australia
> > >
> >

<Prev in Thread] Current Thread [Next in Thread>