ADSM-L

Re: [ADSM-L] When can too many disk volumes be detrimental

2016-01-28 11:39:32
Subject: Re: [ADSM-L] When can too many disk volumes be detrimental
From: Zoltan Forray <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 28 Jan 2016 11:35:50 -0500
Andy,

Thanks for the information.  It sounds like you are sort-of agreeing that
having too many TSM disk volumes is a factor in my slowness (i.e. 30-1TB
volumes is bad since I have them all in one array so reducing to fewer,
larger TSM volumes per array could help.)

I agree SSD would help but we can't afford it.  Even trying for 2-1TB
mirrored volumes for the DB (500GB drives would cut it too close since the
DB on this system is currently 315GB) is cost prohibitive.

On Thu, Jan 28, 2016 at 10:58 AM, Huebner, Andy <andy.huebner AT novartis DOT 
com>
wrote:

> This is long, but I hope it helps without being too specific.
>
> From you description I think you have violated my disk usage rules.
>
> Tape drives are very fast, disk drives (SSD too) are very slow (sequential
> access), that is why we gang disks together (RAID) to keep up.  If you have
> a very large RAID group and are writing to more than 1 tape drive you cause
> random seeks which slow down the throughput of the RAID group.
> When TSM writes to a disk pool it writes each session sequentially, but
> when there is more than 1 session it appears random to the RAID group.
>
> When I build disk volumes I try to limit the data written in such a way
> that the number of tape drives to be fed from each pool (RAID group for me)
> is limited.  Because tape drives can take data faster than a RAID group can
> send it with many seeks you can end up with the tape drive stopping and
> starting.  This really slows down the tape drive.
>
> More smaller disks (therefore RAID groups) to many tape drives will always
> win.  The speed of the disk is less important when there are many arms
> (AS/400 speak) allowing for more than 1 access to occur at the same time.
>
> SSD solves the random access problem, but you still need to configure SSD
> for GB/sec not IOPS.
>
> The last disk pool I built was on a slow disk array (it was retired and
> 'gifted' to backups)  To make it less slow I built many mirrored RAID
> groups.(1+0)  Each became a file system and each contained a volume from
> each disk pool. The actual disks were 500GB FC disks.  Ganged to together
> they were able to keep up with 3592 drives (110MB/sec + compression).
> One file system per RAID group helps reduce seeks and allows for TSM and
> AIX to understand the disk layout and cache better.
>
> I would avoid the purchase of large disks that are to be used for the
> transient daily backups in favor or smaller disks.  I know there is a cost
> part to the problem so you have to make the best possible choice which may
> not be the best choice for the server.
>
> Andy Huebner
> SME - Storage and Backups GDC - Fort Worth
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf 
> Of
> Zoltan Forray
> Sent: Tuesday, January 26, 2016 2:56 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] When can too many disk volumes be detrimental
>
> RedHat Linux 6.7 with TSM 6.3.5.100
>
> Back in the "good old days" of ADSM/TSM, I was always taught that the more
> TSM disk volumes you had, the better since TSM would spread the I/O's
> across the volumes in a somewhat balanced manner, to improve performance.
> Yes I realize this was with multiple physical spindles.
>
> Now with bigger hard drives, I am wondering if having tooooo many volumes
> is hurting I/O performance. Here is the situation.
>
> We recently replaced 2-TSM servers that had rolled off warranty (4-year
> old Dell T710 systems) that had 8-600GB internal disk. The new servers are
> T720 systems with *6TB* drives (both have 96GB RAM).  So I went from roughly
> *5TB* of internal disk storage for inbound backups to *30TB*. I went from
> multiple 300GB disk volumes to 30-1TB volumes. Plus add 20TB of SAN space
> gives me 40-disk volumes.
>
> The reasons for my concern is the time it takes to move the data from disk
> to tape.  I am seeing it take 11-hours to empty (move data) a 100% full 1TB
> disk volume.  To me, this is very, very slow.
>
> We had a hard disk failure that for some reason (all RAID5) took out part
> of the OS partition and damaged the /tsmlog and /tsmarchlog filesystems,
> forcing me to restore from a 8-hour old DB backup (even Dell said this
> should not have happened so they replaced the drive and PERC controller).
> It has taken more than *2-weeks* of non-stop audit, move data of
> non-damaged files, restore of damaged files - processes against the
> internal disk volumes. I recorded some audits running 32-hours).
>
> As I redefine/rebuild the disk volumes, I am starting to create 2 and 3TB
> volumes to see if that helps improve performance.
>
> So, your thoughts/ideas/suggestions on what might be going on here.
>
> --
> *Zoltan Forray*
> TSM Software & Hardware Administrator
> Xymon Monitor Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zforray AT vcu DOT edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://infosecurity.vcu.edu/phishing.html
>



--
*Zoltan Forray*
TSM Software & Hardware Administrator
Xymon Monitor Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html