ADSM-L

Re: [ADSM-L] When can too many disk volumes be detrimental

2016-01-26 17:29:14
Subject: Re: [ADSM-L] When can too many disk volumes be detrimental
From: "Ryder, Michael S" <michael_s.ryder AT ROCHE DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 26 Jan 2016 17:27:04 -0500
Zoltan

If I read your message correct:
 - 1TB over 11 hours is ~200Mbits/sec
 - Dell 6TB drives appear to be 7200rpm SAS drives
 --- It is likely your 600GB drives were 15000rpm
 - your TSM server uses a single RAID-5 array for the OS, application, logs
and archive logs?  is the TSM database on the same array as well?

If so, I have a feeling I have an easy answer for you: stop putting
everything on a single RAID-5 array.  RAID-5 is one of the slowest
arrangements there is, and you have crippled yourself by putting the logs,
OS (and possibly your database) on the same array.  For ultimate
performance, divide your load onto multiple array controllers and multiple
arrays.  Use mirrored drives for the database and log drives (SSD if
possible).  Pack in as much memory as possible into disk cache.  Minimize
latency by keeping as much of the disk "local" to the server.  If you must
use RAID-5 or something for "mass storage" for example to hold your
storage-pools, then use as many spindles as you can afford.  More spindles
means more disk-controllers working to process commands from
array-controllers.

Using this kind of setup I am able to process over 3Gbits/sec on a 4-year
old HP bl460c g6 blade loaded with 12 cores and 96GB RAM and an HP Storage
blade.  Just one storage pool has 500 volumes spread over 28TB.  Switching
to SSD drives for log and database functions was almost a religious
experience.

Maybe it is good to ask you this question - how fast do you need to process
that 1TB of data?  How long should a database restore take?

Another question that I do not see people ask is this -- when a single 6TB
drive fails... how long will it take to rebuild it?  (answer, as you have
found... a LONG time!).  So the march towards larger and larger drives
comes with additional risk.

Well I'm going to shut up now in case I've already gone too far.  I hope
this helps.

Best regards,

Mike Ryder
RMD IT Client Services

On Tue, Jan 26, 2016 at 4:00 PM, Lee, Gary <glee AT bsu DOT edu> wrote:

> Keep us posted.  I have had similar problems in the past year or so.
> Only, I can't get any new hardware.
>
> Still using hp 585 servers with 4 amd processors.
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of
> Zoltan Forray
> Sent: Tuesday, January 26, 2016 3:56 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] When can too many disk volumes be detrimental
>
> RedHat Linux 6.7 with TSM 6.3.5.100
>
> Back in the "good old days" of ADSM/TSM, I was always taught that the more
> TSM disk volumes you had, the better since TSM would spread the I/O's
> across the volumes in a somewhat balanced manner, to improve performance.
> Yes I realize this was with multiple physical spindles.
>
> Now with bigger hard drives, I am wondering if having tooooo many volumes
> is hurting I/O performance. Here is the situation.
>
> We recently replaced 2-TSM servers that had rolled off warranty (4-year old
> Dell T710 systems) that had 8-600GB internal disk. The new servers are T720
> systems with *6TB* drives (both have 96GB RAM).  So I went from roughly
> *5TB* of internal disk storage for inbound backups to *30TB*. I went from
> multiple 300GB disk volumes to 30-1TB volumes. Plus add 20TB of SAN space
> gives me 40-disk volumes.
>
> The reasons for my concern is the time it takes to move the data from disk
> to tape.  I am seeing it take 11-hours to empty (move data) a 100% full 1TB
> disk volume.  To me, this is very, very slow.
>
> We had a hard disk failure that for some reason (all RAID5) took out part
> of the OS partition and damaged the /tsmlog and /tsmarchlog filesystems,
> forcing me to restore from a 8-hour old DB backup (even Dell said this
> should not have happened so they replaced the drive and PERC controller).
> It has taken more than *2-weeks* of non-stop audit, move data of
> non-damaged files, restore of damaged files - processes against the
> internal disk volumes. I recorded some audits running 32-hours).
>
> As I redefine/rebuild the disk volumes, I am starting to create 2 and 3TB
> volumes to see if that helps improve performance.
>
> So, your thoughts/ideas/suggestions on what might be going on here.
>
> --
> *Zoltan Forray*
> TSM Software & Hardware Administrator
> Xymon Monitor Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zforray AT vcu DOT edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://infosecurity.vcu.edu/phishing.html
>