ADSM-L

Re: [ADSM-L] When can too many disk volumes be detrimental

2016-01-28 11:02:26
Subject: Re: [ADSM-L] When can too many disk volumes be detrimental
From: "Huebner, Andy" <andy.huebner AT NOVARTIS DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 28 Jan 2016 15:58:16 +0000
This is long, but I hope it helps without being too specific.

From you description I think you have violated my disk usage rules.

Tape drives are very fast, disk drives (SSD too) are very slow (sequential 
access), that is why we gang disks together (RAID) to keep up.  If you have a 
very large RAID group and are writing to more than 1 tape drive you cause 
random seeks which slow down the throughput of the RAID group.
When TSM writes to a disk pool it writes each session sequentially, but when 
there is more than 1 session it appears random to the RAID group.

When I build disk volumes I try to limit the data written in such a way that 
the number of tape drives to be fed from each pool (RAID group for me) is 
limited.  Because tape drives can take data faster than a RAID group can send 
it with many seeks you can end up with the tape drive stopping and starting.  
This really slows down the tape drive.

More smaller disks (therefore RAID groups) to many tape drives will always win. 
 The speed of the disk is less important when there are many arms (AS/400 
speak) allowing for more than 1 access to occur at the same time.

SSD solves the random access problem, but you still need to configure SSD for 
GB/sec not IOPS.

The last disk pool I built was on a slow disk array (it was retired and 
'gifted' to backups)  To make it less slow I built many mirrored RAID 
groups.(1+0)  Each became a file system and each contained a volume from each 
disk pool. The actual disks were 500GB FC disks.  Ganged to together they were 
able to keep up with 3592 drives (110MB/sec + compression).
One file system per RAID group helps reduce seeks and allows for TSM and AIX to 
understand the disk layout and cache better.

I would avoid the purchase of large disks that are to be used for the transient 
daily backups in favor or smaller disks.  I know there is a cost part to the 
problem so you have to make the best possible choice which may not be the best 
choice for the server.

Andy Huebner
SME - Storage and Backups GDC - Fort Worth

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Zoltan Forray
Sent: Tuesday, January 26, 2016 2:56 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] When can too many disk volumes be detrimental

RedHat Linux 6.7 with TSM 6.3.5.100

Back in the "good old days" of ADSM/TSM, I was always taught that the more TSM 
disk volumes you had, the better since TSM would spread the I/O's across the 
volumes in a somewhat balanced manner, to improve performance.
Yes I realize this was with multiple physical spindles.

Now with bigger hard drives, I am wondering if having tooooo many volumes is 
hurting I/O performance. Here is the situation.

We recently replaced 2-TSM servers that had rolled off warranty (4-year old 
Dell T710 systems) that had 8-600GB internal disk. The new servers are T720 
systems with *6TB* drives (both have 96GB RAM).  So I went from roughly
*5TB* of internal disk storage for inbound backups to *30TB*. I went from 
multiple 300GB disk volumes to 30-1TB volumes. Plus add 20TB of SAN space gives 
me 40-disk volumes.

The reasons for my concern is the time it takes to move the data from disk to 
tape.  I am seeing it take 11-hours to empty (move data) a 100% full 1TB disk 
volume.  To me, this is very, very slow.

We had a hard disk failure that for some reason (all RAID5) took out part of 
the OS partition and damaged the /tsmlog and /tsmarchlog filesystems, forcing 
me to restore from a 8-hour old DB backup (even Dell said this should not have 
happened so they replaced the drive and PERC controller).
It has taken more than *2-weeks* of non-stop audit, move data of non-damaged 
files, restore of damaged files - processes against the internal disk volumes. 
I recorded some audits running 32-hours).

As I redefine/rebuild the disk volumes, I am starting to create 2 and 3TB 
volumes to see if that helps improve performance.

So, your thoughts/ideas/suggestions on what might be going on here.

--
*Zoltan Forray*
TSM Software & Hardware Administrator
Xymon Monitor Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will never 
use email to request that you reply with your password, social security number 
or confidential personal information. For more details visit 
http://infosecurity.vcu.edu/phishing.html