Re: Disk Storage Volume Pool Sizing

"Prather, Wanda" wrote:

> 1.  IT DEPENDS.  (You KNEW I was going to say that, didn't you?!?)

Yes.

>
>
> 2.  There is no "right" answer (so you will probably be getting LOTS of
> opinions back about this question ;>).

Understood.

> 3.  If you only have a few clients to back up, it just doesn't matter.

That's me, however, I was thinking in terms of tuning my repository for quickest
throughput.

> 4.  The purpose of your disk pool is to act as a "buffer", so that you can
> have many more clients backing up concurrently than you have tape drives.
> As long as your disk pool is large enough so that your clients can back up
> without waiting for a tape, you have an "adequate" configuration.

Understood, however I go back to my size tuning question, more smaller or less
bigger?  But, you said it doesn't matter which probably answers my question.

> 5.  The rule of thumb is that you want to have enough space in your disk
> pool to hold at least one day's backups.  That way if there is a tape
> problem (more common than disk problems), you will have time to fix it when
> you arrive in the morning without any backups failing.  Now you have a
> "better" configuration.

Understood.

> 6.  After that, what you do with your disk pool is work on increasing
> throughput/performance, ie. the "best" combination of function and
> throughput.  And it depends a lot on your client mix and a lot on your
> hardware.

Assume very few clients (1-2) with many many filesystems.

> IF you have "n" clients backing up concurrently, and you have at least "n"
> disk pool volumes, TSM will start "n" I/O's in parallel to the different
> disk pool volumes.  So to take advantage of MANY volumes, you have to have
> MANY concurrent backups  in progress.  But, if you are writing to 1 spindle
> of disk with a zillion tiny diskpool volumes, you will get more thrashing of
> the disk than throughput.

This is important information I could not confirm.  Thank you for this.

> So, if writing to a raw (not RAID) spindle, you proably want 2-3 diskpool
> volumes per spindle.

No, it's just JBOD spindles now, maybe RAID in the future.

> On the other hand, lots of people use some type of RAID these days.  The
> disk pool I/O is sequential; some people have reported good results with
> striping across multiple physical spindles.  And if you are writing to a
> Shark with gobs of cache memory in front of it, that buffers the effect of
> the disk head movement and you may not be able to come up with many
> configurations where you can measure the difference between a "few" and
> "many" disk pool volumes.

Good information, thank you.

> 6.  NONE of that is really going to affect your throughput to LTO.
>
>  - no matter how many (or few) diskpool volumes you have, if they are full
> of lots of itty bitty files (actually aggregates of files), remember that
> TSM has to update the data base each time it moves one.  It's hard to stream
> enough data to the tape during this process to get great throughput numbers.

Good information.

> - no matter how many (or few) diskpool volumes you have, if they contain a
> few BIG files, TSM has a better chance of pushing the data to LTO fast,
> because it doesn't have to make data base updates as it goes.
>
> - many people report better thoroughput on BIG files when writing direct
> from the client to tape, not the disk pool.
>
> This should give people LOTS of targets to respond to!!

Thanks for your detailed response.  From my standpoint, I guess I should
allocate large files which will slightly exceed the number of clients streams I
anticipate.  I am also running multiple streams from the same client at the same
time.  But, you answered my overall question about many-small or few-big and I
think I have enough information to cobble a nice architecture here.  Thank you
again.

Mitch

>
>
> Wanda Prather
> Johns Hopkins University Applied Physics Laboratory
> 443-778-8769
>