Re: [ADSM-L] When can too many disk volumes be detrimental

Zoltan,

I think your speed problem is not so much in your primary storage pool, but
in your database and log filesystems.

Are you using TDP for Virtual Environments for block-level image backups?
Do you have deduplication enabled?  Both features will grind away at your
log filesystems.  In my case enabling those features tripled the size of
the DB and the rate and size at which the log space is used.

At the same time TSM is trying to stream data to  your primary disk-based
storage pool, it is also trying to update its log files.  When they both
exist on the same physical drives, those drives will be taxed in trying to
perform both operations - even with a single array controller it would be
more efficient to have spindles dedicated to log/db filesystems.  Perhaps
consider replacing your OS/DB/App drives with a mirror of large SSDs, and
put the log files there, too.  Even if they are all on the same array
controller, the speed improvement for the DBs and logs should lower the
latency of operations concerning your RAID5 array.

What filesystem?  ext3 or ext4?  If you're on ext3, I understand from IBM's
docs that as of RHEL6.x ext4 is suitable and provides some performance
improvement.  Did you follow the docs and disable RHEL's read-ahead
caching?  If so, you may want to consider enabling it.

Wait... why would you be able to go with RAID10?  You would go from a RAID5
array of (6) 6TB drives of *30TB*... to a RAID10 array of (6) 6TB drives of
*18TB* - I thought you said you couldn't slice and dice this anymore?

Sadly - if you don't have any other flexibility, I think you will have to
live with your performance, because your priority was for capacity and
unfortunately you can't have both in these harddrives.

Best regards,

Mike, x7942
RMD IT Client Services

On Tue, Jan 26, 2016 at 7:59 PM, Zoltan Forray <zforray AT vcu DOT edu> wrote:

> Mike,
>
> You bought up some valid points and good questions.  I do have to clarify
> something I left out.
>
> This machine also has 2-1TB drives in the back of the system.  They are
> mirrored and used for the TSM DB and OS (which does very little since TSM
> is the ONLY application on this server.  The big-honkin-disk are used for
> everything else (/tsmlog, /tsmarchlog, general TSM storage.
>
> Yes the 6TB are 7200 SATA (we got the most internal storage we could for
> the $$$$$ we had to spend). IIRC, Dell charged over $1K for the 6TB
> drives).
>
> We can't slice-and-dice the RAID array any finer since we would loose 6TB
> at a time.  We have discussed going for RAID10 when the box is rebuilt (the
> OS folks feel that since there was known damage to OS files, there might be
> unknown/hidden damages).
>
> On Tue, Jan 26, 2016 at 5:27 PM, Ryder, Michael S <
> michael_s.ryder AT roche DOT com
> > wrote:
>
> > Zoltan
> >
> > If I read your message correct:
> >  - 1TB over 11 hours is ~200Mbits/sec
> >  - Dell 6TB drives appear to be 7200rpm SAS drives
> >  --- It is likely your 600GB drives were 15000rpm
> >  - your TSM server uses a single RAID-5 array for the OS, application,
> logs
> > and archive logs?  is the TSM database on the same array as well?
> >
> > If so, I have a feeling I have an easy answer for you: stop putting
> > everything on a single RAID-5 array.  RAID-5 is one of the slowest
> > arrangements there is, and you have crippled yourself by putting the
> logs,
> > OS (and possibly your database) on the same array.  For ultimate
> > performance, divide your load onto multiple array controllers and
> multiple
> > arrays.  Use mirrored drives for the database and log drives (SSD if
> > possible).  Pack in as much memory as possible into disk cache.  Minimize
> > latency by keeping as much of the disk "local" to the server.  If you
> must
> > use RAID-5 or something for "mass storage" for example to hold your
> > storage-pools, then use as many spindles as you can afford.  More
> spindles
> > means more disk-controllers working to process commands from
> > array-controllers.
> >
> > Using this kind of setup I am able to process over 3Gbits/sec on a 4-year
> > old HP bl460c g6 blade loaded with 12 cores and 96GB RAM and an HP
> Storage
> > blade.  Just one storage pool has 500 volumes spread over 28TB.
> Switching
> > to SSD drives for log and database functions was almost a religious
> > experience.
> >
> > Maybe it is good to ask you this question - how fast do you need to
> process
> > that 1TB of data?  How long should a database restore take?
> >
> > Another question that I do not see people ask is this -- when a single
> 6TB
> > drive fails... how long will it take to rebuild it?  (answer, as you have
> > found... a LONG time!).  So the march towards larger and larger drives
> > comes with additional risk.
> >
> > Well I'm going to shut up now in case I've already gone too far.  I hope
> > this helps.
> >
> > Best regards,
> >
> > Mike Ryder
> > RMD IT Client Services
> >
> > On Tue, Jan 26, 2016 at 4:00 PM, Lee, Gary <glee AT bsu DOT edu> wrote:
> >
> > > Keep us posted.  I have had similar problems in the past year or so.
> > > Only, I can't get any new hardware.
> > >
> > > Still using hp 585 servers with 4 amd processors.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On 
> > > Behalf
> Of
> > > Zoltan Forray
> > > Sent: Tuesday, January 26, 2016 3:56 PM
> > > To: ADSM-L AT VM.MARIST DOT EDU
> > > Subject: [ADSM-L] When can too many disk volumes be detrimental
> > >
> > > RedHat Linux 6.7 with TSM 6.3.5.100
> > >
> > > Back in the "good old days" of ADSM/TSM, I was always taught that the
> > more
> > > TSM disk volumes you had, the better since TSM would spread the I/O's
> > > across the volumes in a somewhat balanced manner, to improve
> performance.
> > > Yes I realize this was with multiple physical spindles.
> > >
> > > Now with bigger hard drives, I am wondering if having tooooo many
> volumes
> > > is hurting I/O performance. Here is the situation.
> > >
> > > We recently replaced 2-TSM servers that had rolled off warranty (4-year
> > old
> > > Dell T710 systems) that had 8-600GB internal disk. The new servers are
> > T720
> > > systems with *6TB* drives (both have 96GB RAM).  So I went from roughly
> > > *5TB* of internal disk storage for inbound backups to *30TB*. I went
> from
> > > multiple 300GB disk volumes to 30-1TB volumes. Plus add 20TB of SAN
> space
> > > gives me 40-disk volumes.
> > >
> > > The reasons for my concern is the time it takes to move the data from
> > disk
> > > to tape.  I am seeing it take 11-hours to empty (move data) a 100% full
> > 1TB
> > > disk volume.  To me, this is very, very slow.
> > >
> > > We had a hard disk failure that for some reason (all RAID5) took out
> part
> > > of the OS partition and damaged the /tsmlog and /tsmarchlog
> filesystems,
> > > forcing me to restore from a 8-hour old DB backup (even Dell said this
> > > should not have happened so they replaced the drive and PERC
> controller).
> > > It has taken more than *2-weeks* of non-stop audit, move data of
> > > non-damaged files, restore of damaged files - processes against the
> > > internal disk volumes. I recorded some audits running 32-hours).
> > >
> > > As I redefine/rebuild the disk volumes, I am starting to create 2 and
> 3TB
> > > volumes to see if that helps improve performance.
> > >
> > > So, your thoughts/ideas/suggestions on what might be going on here.
> > >
> > > --
> > > *Zoltan Forray*
> > > TSM Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > Virginia Commonwealth University
> > > UCC/Office of Technology Services
> > > www.ucc.vcu.edu
> > > zforray AT vcu DOT edu - 804-828-4807
> > > Don't be a phishing victim - VCU and other reputable organizations will
> > > never use email to request that you reply with your password, social
> > > security number or confidential personal information. For more details
> > > visit http://infosecurity.vcu.edu/phishing.html
> > >
> >
>
>
>
> --
> *Zoltan Forray*
> TSM Software & Hardware Administrator
> Xymon Monitor Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zforray AT vcu DOT edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://infosecurity.vcu.edu/phishing.html
>