[ADSM-L] Buying disk: (was: TSM Dedup stgpool target)

Hi Sergio,

It took me a while to get back to this, but it's important and I see site after 
site falling into a mudhole with disk purchases.

This is both an "it depends" and a "what not to do" answer.
And I hope we can get some other people to chime in with their methodologies.

What not to do:

The problem with calling disk vendors: if you tell them you need 400TB, they 
will assume you are shopping around and bid you the lowest priced thing they 
have at 400TB; or if they think you have money and you don't know anything, 
they will bid you the highest priced thing they think they can get away with.  
The first will not do the job, and since you asked about midrange I'm assuming 
you don't want to solve the problem by going to 400 TB of the fastest disk 
money can buy, which would work but be expensive.

What you seriously have to do is

1.  Define your size *and* your throughput requirements, and
2.  Put them in writing, and tell the vendor you will not accept delivery of 
the box until *after* it meets a throughput test.  And make sure you hold them 
to it and they agree they will take the box back if it fails to meet the 
throughput.

I've seen failure to do those 2 critical things create nightmares many times 
(for sites who call me in after the fact to solve their TSM performance 
problems).

Vendors *will* do this.  My commercial customers *do*demand it, and get it.  If 
you find a vendor who won't, or pretends it doesn't matter, don't talk to them 
because you aren't talking to anybody who understands the technical issues.  
(And yes, there are many folks in disk sales in the Mid-Atlantic who don't know 
what they are doing.  I know some of their names.)  Demand a pre-sales 
conference with the engineers, not just the salesmen.

The mid-range and low-end disk market has many, many options now.  Performance 
depends on how much cache, the firmware, and how many spindles you have 
spinning, not just the type of RAID anymore.

I do not sell hardware or software.  But I have worked with vendors  who will 
take your performance requirements, and configure the box to meet the 
throughput you need, which is what you have to do.  (If you need a contact in 
MD, I can give you one.)

Now some "it depends" with an illustrations:

In TSM 6.3.4, server-end dedup is incredibly I/O intensive.

I have one customer using a very powerful Windows box to do server-end dedup of 
3.5 TB - 4-TB of TSM/VE backups per day (tiny blocks, 3.8-4:0 dedup).  They 
have a DS35xx disk array (which is incredibly affordable).  As originally 
configured (by a crappy vendor) with 1 controller, that DS35xx array would do 
at most 40MB/second.  We beefed up that array by adding a controller, disks, 
and upgrading to XIV-like DDP firmware (all-way striped, all disk spinning all 
the time).  Now that little inexpensive box will do 10,000 I/O's per second, 
400MB+ per second.  We got a 10-fold improvement in throughput, for relatively 
little $$.

So if you are talking server-end dedup, start with the amount of data you have 
incoming each day, mumblety Terabytes.  Consider that you have to land it in on 
disk.  Then you have to read it again for the BACKUP STGPOOL.  Then you have to 
read it again for the IDENTIFY DUPLICATES process.  Then you have to read it 
again for the reclaim dedup process, and write the resulting deduped blocks 
back out  (25% of the data for a 4:1 dedup).  And oh, by the way, if there is 
tape reclaim involved, read some of it again, and there will be lock conflicts 
that slow that process.  And if you replicate, do it again (post dedup would be 
25% at 4:1).  And if your server is a replication target, include that I/O as 
well.  And that doesn't include your TSM DB I/O.

 So I have no documented rules of thumb, but seems to me that for every 1 TB of 
data coming in, I'd assume at least 4 TB of I/O just for the data, not 
including replication or the TSM DB I/O.

So assume you have 2 TB coming in per day.
Multiply by 4 for 2 TB * 1024 * 1024 = 8388608 megabytes.
Assume you want to get everything done in 16 hours * 60 * 60 = 57600 seconds.
That means you need disk that will sustain > 145 Megabytes per second.  (not 
including the I/O to the TSM DB)

That's very do-able with midrange disk, but as illustrated above, *it matters* 
on how it's configured.

That's just an example for a server-end dedup case.
I would be interested in hearing from anybody else what methodology they use to 
figure this out, or any other ROT.

Oher things you could do:

1) dedup on the client end.  I don't have any numbers on that.
2) TSM 7.1 is advertising a 10-fold improvement in dedup throughput.  At this 
point I have no idea what that means or what is required to achieve it.  
Anybody got numbers or information about how it works?

Another thing you have to consider:
When you ask vendors about throughput on a disk array, you have to ask "for how 
many concurrent processes".
1.      If you are trying to backup 1 big SAP DB, for example, with one 
session, what you care about is the throughput of a single process.
2.      If you are backing up many small clients at once, plus doing backup 
stgpool and dedup, what you (usually) care about is not the throughput for a 
single process, but the total throughput when many processes are running at 
once.

Most disk arrays get more throughput for case 2 than case 1.  Be sure you 
specify what case you are asking for, when you give the vendor your throughput 
requirements.  And again, don't assume that the disk salesman has any idea what 
you are talking about.  Talk to the engineers, and SPECIFY the case you will 
use for your throughput test.)


My recommendations:

*       My personal favorite for mid-range disk is the V7000, with the XIV-Like 
DDP firmware.  There is a low-end inexpensive version and higher-end version.  
The cool thing is that you can improve performance as you grow by adding 
spindles.

*       Test.  If you don't have dedup now,  set up a small pool and play with 
it, so you get a feel for the lifecycle.

Wanda








-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Sergio O. Fuentes
Sent: Wednesday, November 13, 2013 10:32 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] TSM Dedup stgpool target

In an earlier thread, I polled this group on whether people recommend going 
with an array-based dedup solution or doing a TSM dedup solution.  Well, the 
answers came back mixed, obviously with an 'It depends'-type clause.

So, moving on...  assuming that I'm using TSM dedup, what sort of target arrays 
are people putting behind their TSM servers.   Assume here, also, that you'll 
be having multiple TSM servers,  another backup product, *coughveeam and 
potentially having to do backup stgpools on the dedup stgpools.  I ask because 
I've been barking up the mid-tier storage array market as our potential disk 
based backup target simply because of the combination of cost, performance, and 
scalability.  I'd prefer something that is dense I.e. more capacity less 
footprint and can scale up to 400TB.  It seems like vendors get disappointed 
when you're asking for a 400TB array with just SATA disk simply for backup 
targets.  None of that fancy array intelligence like auto-tiering, large 
caches, replication, dedup, etc.. is required.

Is there another storage market I should be looking at, I.e. really dumb raid 
arrays, direct attached, NAS, etc...

Any feedback is appreciated, even the 'it depends'-type.

Thanks!
Sergio