Results 1 to 9 of 9
  1. #1
    Member
    Join Date
    Nov 2009
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Slow write performance in disk pool

    Hi,

    I have a TSM 5.5.2.1 server on Linux (RHEL5 64bit) recently installed, with a 900GB diskpool with the following characteristics:

    • 18 files (50GB each)
    • ext3 filesystem
    • RAID5 array (10 disks, locally attached SCSI)


    I've done some I/O tests on this filesystem, and from them I gathered that on this filesystem I can do sequential reads at an average of 100MB/sec, and writes at about the same speed.

    However, TSM just doesn't go above 12MB/sec (and is frequently as low as 5MB/sec).

    I've even tried running an I/O test at the same time TSM is writing to the disk pool, just to see if the I/O was saturated, but no. I get the same 100MB/sec minus TSM's consumption.

    This strange thing is, I've tried changing some options in the server, like increasing the bufferpool size, recreating all diskpool volumes one-by-one. But nothing worked.

    What makes this really strange, is that after some configuration changes I had times where I got 80-90MB/sec, but after a couple of hours, it got back to the same slow throughput.

    To exclude client issues, I've tried backing up directly to tape. In this case the throuput is consistent, always around 60MB/sec.

    To sum things up:

    • Every I/O test shows higher rates (even when TSM is writing to disk at the same time);
    • Clients always get higher rates if backing up directly to tape.


    Any ideas would be appreciated, since I don't know what else to try.

  2. #2
    Senior Member
    Join Date
    Nov 2005
    Location
    LU Germany
    Posts
    1,066
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    When we still thought running TSM on Linux was a good idea, we had all sorts of performance issues with filesystems until we switched to GPFS and finally got rid of Linux alltogether. Try turning directio off in the dsmserv.opt and if that doesn't help, well, good luck.

    PJ

  3. #3
    Moderator
    Join Date
    Nov 2005
    Location
    Victoria, Australia
    Posts
    537
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by CrLf View Post
    Hi,

    I have a TSM 5.5.2.1 server on Linux (RHEL5 64bit) recently installed, with a 900GB diskpool with the following characteristics:

    • 18 files (50GB each)
    • ext3 filesystem
    • RAID5 array (10 disks, locally attached SCSI)


    This strange thing is, I've tried changing some options in the server, like increasing the bufferpool size, recreating all diskpool volumes one-by-one. But nothing worked.

    What makes this really strange, is that after some configuration changes I had times where I got 80-90MB/sec, but after a couple of hours, it got back to the same slow throughput.

    To exclude client issues, I've tried backing up directly to tape. In this case the throuput is consistent, always around 60MB/sec.

    To sum things up:

    • Every I/O test shows higher rates (even when TSM is writing to disk at the same time);
    • Clients always get higher rates if backing up directly to tape.


    Any ideas would be appreciated, since I don't know what else to try.
    On the first point... in a raid5 array of 10 disks or 9+1 you probably should be running 9 files rather than 18, but that is not enough to be slowing down to this rate.

    In your "direct to tape" backups are you going Lan Free or using the LAN?

  4. #4
    Senior Member javajockey's Avatar
    Join Date
    Dec 2007
    Location
    Yorktown
    Posts
    265
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    What does you IOSTAT look like in Linux?
    Code:
    iostat -x -m 5
    almost every time I see this problem, it's related to the disk subsystem.
    Post the output from this when your seeing problems



    How is your tapes drives hooked up Fibre? If so, are the on a separate HBA from your drives?

    Make sure that you have "read ahead" enabled on your Raid system. This will surely improve your disk to tape performance.

    Is your DB and Log on separate drives from your storage pools?

    how many streams are your trying to write to tape? Your disk system may not be robust enough to handle multiple streams to LTO4 tape.

    Please post more detailed information about your setup.

  5. #5
    Member
    Join Date
    Nov 2009
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Ok, so answering all questions at once:

    The tape drives are LTO3 on an TBM TS3200 library, connected through SCSI. The disks are on a different controller than the tapes/library (not just a different channel on the same controller). The disk controller is a "cciss" with 320MB of cache, and the disks are actually two separate RAID5 arrays (4+1 each).

    Over those two RAID5 arrays, there is an LVM with three logical volumes, one for the DB (on the first array), other for the logs (on the second array) and another for the disk pool (spanning the two arrays).

    Tape performance is as expected (my tests were LAN-to-tape, BTW). No problems there. Even disk pool migration to tape performs as expected (migrating to two tapes at once shows 100MB/sec reads from disk).

    DB buffer pool is 128072 pages, with a 99.5% hit ratio.

    I thought about DB/log issues, but this is too great an impact for this. TSM is writing to disk almost 10 times slower than the disks are capable of. For instance, this is the output for "iozone -s5g -r512k -t2 -i0 -w" (write two 5GB files in parallel):

    Children see throughput for 2 initial writers = 121527.27 KB/sec
    Parent sees throughput for 2 initial writers = 102106.44 KB/sec
    Min throughput per process = 60555.71 KB/sec
    Max throughput per process = 60971.55 KB/sec
    Avg throughput per process = 60763.63 KB/sec
    Min xfer = 5214720.00 KB

    Children see throughput for 2 rewriters = 98565.30 KB/sec
    Parent sees throughput for 2 rewriters = 92376.96 KB/sec
    Min throughput per process = 48818.41 KB/sec
    Max throughput per process = 49746.90 KB/sec
    Avg throughput per process = 49282.65 KB/sec
    Min xfer = 5145088.00 KB

    A few minutes ago I stopped the server to clean up the "dsmserv.opt" (it was full of duplicate entries, it seems the server appends a line whenever we do a "setopt", keeping the old values around). I changed nothing else, but when I restarted the server it was fast again (100MB/sec).

    I stopped the server again and added "DIRECTIO NO". It kept the 100MB/sec write rate. I rebooted the server (that always works in getting things back to the miserable performance) and it seems to be holding (lets wait a while before declaring victory, because this behaviour of being fast for a few days and then getting back to slowness has been observed before).

  6. #6
    Member
    Join Date
    Jul 2008
    Posts
    286
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Hi CrLf,

    On the site of ibm, you can see this link for a problem of performance with TSM server 5.5.2.1 : http://www-01.ibm.com/support/docvie...id=swg1IC61839
    The answers is : Use the 5.5.2.2, 5.5.2.3, and 5.5.2.4 interim fix packages
    on platforms on which those interim fixes were shipped.
    Last edited by Samuel; 11-18-2009 at 06:04 AM.

  7. #7
    Member
    Join Date
    Nov 2009
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by Samuel View Post
    The answers is : Use the 5.5.2.2, 5.5.2.3, and 5.5.2.4 interim fix packages on platforms on which those interim fixes were shipped.
    On the client side I'm already using the latest versions (within the 5.5.x.x branch - 5.5.2.6 for BA, 5.5.3.0 for TDPExchange). On the server side, 5.5.2.1 is the lastest of the 5.5.2.x branch (there is a 5.5.3.0 server, but I will upgrade only if necessary).

  8. #8
    Senior Member javajockey's Avatar
    Join Date
    Dec 2007
    Location
    Yorktown
    Posts
    265
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    OK, Just so I'm straight.

    disk (backuppool) to tape OK
    client to tape OK
    client to disk (backuppool ) SLOW


    does this sound about right right?

  9. #9
    Member
    Join Date
    Nov 2009
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by javajockey View Post
    disk (backuppool) to tape OK
    client to tape OK
    client to disk (backuppool ) SLOW

    does this sound about right right?
    Yes. But right now everything is OK. "DIRECTIO NO" appears to be working, but I'm waiting before jumping to conclusions.

Similar Threads

  1. Slow LTO4 performance
    By llorcaa in forum Tape / Media Library
    Replies: 3
    Last Post: 05-05-2008, 09:55 AM
  2. Slow backup performance
    By arjess in forum Performance Tuning
    Replies: 18
    Last Post: 09-12-2007, 10:27 AM
  3. Copying TSM primary disk pool to secondary disk pool
    By spmcant in forum Tape / Media Library
    Replies: 0
    Last Post: 08-16-2004, 11:44 AM
  4. Retreive Performance is horribly slow
    By cabec in forum Performance Tuning
    Replies: 1
    Last Post: 07-07-2004, 03:50 PM
  5. slow performance across firewall
    By dietmar.hoepfner in forum Backup / Archive Discussion
    Replies: 0
    Last Post: 11-28-2003, 10:22 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •