ADSM-L

Re: [ADSM-L] When can too many disk volumes be detrimental

2016-01-26 20:01:15
Subject: Re: [ADSM-L] When can too many disk volumes be detrimental
From: Zoltan Forray <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 26 Jan 2016 19:59:14 -0500
Mike,

You bought up some valid points and good questions.  I do have to clarify
something I left out.

This machine also has 2-1TB drives in the back of the system.  They are
mirrored and used for the TSM DB and OS (which does very little since TSM
is the ONLY application on this server.  The big-honkin-disk are used for
everything else (/tsmlog, /tsmarchlog, general TSM storage.

Yes the 6TB are 7200 SATA (we got the most internal storage we could for
the $$$$$ we had to spend). IIRC, Dell charged over $1K for the 6TB
drives).

We can't slice-and-dice the RAID array any finer since we would loose 6TB
at a time.  We have discussed going for RAID10 when the box is rebuilt (the
OS folks feel that since there was known damage to OS files, there might be
unknown/hidden damages).

On Tue, Jan 26, 2016 at 5:27 PM, Ryder, Michael S <michael_s.ryder AT roche DOT 
com
> wrote:

> Zoltan
>
> If I read your message correct:
>  - 1TB over 11 hours is ~200Mbits/sec
>  - Dell 6TB drives appear to be 7200rpm SAS drives
>  --- It is likely your 600GB drives were 15000rpm
>  - your TSM server uses a single RAID-5 array for the OS, application, logs
> and archive logs?  is the TSM database on the same array as well?
>
> If so, I have a feeling I have an easy answer for you: stop putting
> everything on a single RAID-5 array.  RAID-5 is one of the slowest
> arrangements there is, and you have crippled yourself by putting the logs,
> OS (and possibly your database) on the same array.  For ultimate
> performance, divide your load onto multiple array controllers and multiple
> arrays.  Use mirrored drives for the database and log drives (SSD if
> possible).  Pack in as much memory as possible into disk cache.  Minimize
> latency by keeping as much of the disk "local" to the server.  If you must
> use RAID-5 or something for "mass storage" for example to hold your
> storage-pools, then use as many spindles as you can afford.  More spindles
> means more disk-controllers working to process commands from
> array-controllers.
>
> Using this kind of setup I am able to process over 3Gbits/sec on a 4-year
> old HP bl460c g6 blade loaded with 12 cores and 96GB RAM and an HP Storage
> blade.  Just one storage pool has 500 volumes spread over 28TB.  Switching
> to SSD drives for log and database functions was almost a religious
> experience.
>
> Maybe it is good to ask you this question - how fast do you need to process
> that 1TB of data?  How long should a database restore take?
>
> Another question that I do not see people ask is this -- when a single 6TB
> drive fails... how long will it take to rebuild it?  (answer, as you have
> found... a LONG time!).  So the march towards larger and larger drives
> comes with additional risk.
>
> Well I'm going to shut up now in case I've already gone too far.  I hope
> this helps.
>
> Best regards,
>
> Mike Ryder
> RMD IT Client Services
>
> On Tue, Jan 26, 2016 at 4:00 PM, Lee, Gary <glee AT bsu DOT edu> wrote:
>
> > Keep us posted.  I have had similar problems in the past year or so.
> > Only, I can't get any new hardware.
> >
> > Still using hp 585 servers with 4 amd processors.
> >
> >
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On 
> > Behalf Of
> > Zoltan Forray
> > Sent: Tuesday, January 26, 2016 3:56 PM
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: [ADSM-L] When can too many disk volumes be detrimental
> >
> > RedHat Linux 6.7 with TSM 6.3.5.100
> >
> > Back in the "good old days" of ADSM/TSM, I was always taught that the
> more
> > TSM disk volumes you had, the better since TSM would spread the I/O's
> > across the volumes in a somewhat balanced manner, to improve performance.
> > Yes I realize this was with multiple physical spindles.
> >
> > Now with bigger hard drives, I am wondering if having tooooo many volumes
> > is hurting I/O performance. Here is the situation.
> >
> > We recently replaced 2-TSM servers that had rolled off warranty (4-year
> old
> > Dell T710 systems) that had 8-600GB internal disk. The new servers are
> T720
> > systems with *6TB* drives (both have 96GB RAM).  So I went from roughly
> > *5TB* of internal disk storage for inbound backups to *30TB*. I went from
> > multiple 300GB disk volumes to 30-1TB volumes. Plus add 20TB of SAN space
> > gives me 40-disk volumes.
> >
> > The reasons for my concern is the time it takes to move the data from
> disk
> > to tape.  I am seeing it take 11-hours to empty (move data) a 100% full
> 1TB
> > disk volume.  To me, this is very, very slow.
> >
> > We had a hard disk failure that for some reason (all RAID5) took out part
> > of the OS partition and damaged the /tsmlog and /tsmarchlog filesystems,
> > forcing me to restore from a 8-hour old DB backup (even Dell said this
> > should not have happened so they replaced the drive and PERC controller).
> > It has taken more than *2-weeks* of non-stop audit, move data of
> > non-damaged files, restore of damaged files - processes against the
> > internal disk volumes. I recorded some audits running 32-hours).
> >
> > As I redefine/rebuild the disk volumes, I am starting to create 2 and 3TB
> > volumes to see if that helps improve performance.
> >
> > So, your thoughts/ideas/suggestions on what might be going on here.
> >
> > --
> > *Zoltan Forray*
> > TSM Software & Hardware Administrator
> > Xymon Monitor Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zforray AT vcu DOT edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://infosecurity.vcu.edu/phishing.html
> >
>



--
*Zoltan Forray*
TSM Software & Hardware Administrator
Xymon Monitor Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html