ADSM-L

Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned

2007-07-31 02:17:27
Subject: Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 31 Jul 2007 01:14:42 -0500
.
I think you are right about the Log - it need not be spread across
multiple volumes. It's only got one writer.

Your RAID type can affect the performance of the Disk Storage Pools and
the Database dramatically. In particular, RAID5 is very poorly suited
for this, because it is 50% writes. RAID5 is also not ideal for the
Database, though it can be tolerated for the Log. RAID10 is much better.

You should be using fast disks, not SATA, for the primary Disk Storage
Pools. I've got 10,000rpm IBM SSA disks for these.

I use RAID10 for the Disk Storage Pools. I use JBOD disks with TSM
mirroring for the Log and Database. This is slightly slower than OS
mirroring or RAID-array mirroring, but it is somewhat safer. Each
physical volume for Storage Pools and Database is broken into many
Logical Volumes.

You should be saving your fastest disks for the Database. I've got
15,000prm disks for the Database. When I moved the Database from
10,000rpm disks to 15,000rpm disks, everything in TSM got noticeably
faster. For instance, DB backups now take 1/3 less time. RAID boxes just
get in the way for the Database; it really runs best on JBOD disks with
TSM doing the mirroring.

Here's a controversial paper written by a guy at Oracle. He says you
should "Stripe And Mirror Everything" (S.A.M.E.) I've read and reread
this several times, and while I definitely do not agree with everything
said, it does raise some very interesting points that definitely apply
to TSM. For one thing he strongly advocates RAID10, as do I.
http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf

Most of my Log pinning problems have been caused by clients. If a client
suffers a networking problem (typically a half-duplex vs. full-duplex
conflict) and if that client tries to back up a large file such as a
movie, that can pin the log on our system until it fills completely.
Minimum throughput controls in TSM can help here, though it can still
happen. I wrote a daemon that watches the Log fullness and if it gets to
about 70% it cancels the session that has the Log pinned. I still have
problems, because the cancel command can take hours to work if the
client is backing up a large file slowly. If the Log gets to 95% it does
a TSM shutdown command, which is vastly easier to recover from than a
100% full log. At least with a full TSM shutdown, our novice sysadmin's
first impulse which is to try to restart it, is generally a good thing
to do. It usually restarts with an empty Log in these cases, so they can
claim, "I fixed it!" without knowing the underlying complexities.

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu
===== "Standards are great. That's why there are so many of them." =====




On Mon, 30 Jul 2007, Andrew Carlson wrote:

>always heard the DB should, because it opens multiple threads with multiple
>volumes, but since the log is sequentially written to for the most part, I
>can't figure out why that should be in multiple volumes.  Thanks.
>
>On 7/30/07, Charles A Hart <charles_hart AT uhc DOT com> wrote:
>>
>> Your DB and Log shold be RAW as well, and in small vols.  (ie 12GB log
>> should be in 2-3GB VOls, DB, vols, depengin on size of db should be 5-10GB
>> vols.  Also try to make sure the raw logical vols are evenly spread
>> accross as many LUNs as possible.
>>
>> Charles Hart
>>
>>
>>
>>
>>
>> "Stapleton, Mark" <mark.stapleton AT BERBEE DOT COM>
>> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>> 07/29/2007 07:03 AM
>> Please respond to
>> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>>
>>
>> To
>> ADSM-L AT VM.MARIST DOT EDU
>> cc
>>
>> Subject
>> Re: [ADSM-L] TSM performance very poor, Recovery log is being pinned
>>
>>
>>
>>
>>
>>
>> From: ADSM: Dist Stor Manager on behalf of Craig Ross
>> >TSM is installed on Solaris 10
>>
>> This is something that popped right out for me. Do you have your storage
>> pools located on raw logical volumes or mounted filesystems? If the
>> latter, that might be your problem. Solaris has traditionally had
>> incredibly poor throughput performance on mounted filesystems.
>>
>> You might give thought to rebuilding those storage pools on raw logical
>> volumes. Of course, that will require that you completely flush all data
>> from your disk storage pools to tape storage pools first, so as not to
>> lose client data.
>>
>> --
>> Mark Stapleton (mark.stapleton AT berbee DOT com)
>> Berbee Information Networks (a CDW company)
>>
>>
>>
>> This e-mail, including attachments, may include confidential and/or
>> proprietary information, and may be used only by the person or entity to
>> which it is addressed. If the reader of this e-mail is not the intended
>> recipient or his or her authorized agent, the reader is hereby notified
>> that any dissemination, distribution or copying of this e-mail is
>> prohibited. If you have received this e-mail in error, please notify the
>> sender by replying to this message and delete this e-mail immediately.
>>
>
>
>
>--
>Andy Carlson
>---------------------------------------------------------------------------
>Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month,
>The feeling of seeing the red box with the item you want in it:Priceless.
>

<Prev in Thread] Current Thread [Next in Thread>