• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

Appalling FILE dev class performance

jabuzzard

Active Newcomer
Joined
Dec 12, 2008
Messages
8
Reaction score
0
Points
0
I have recently upgraded our server and I am getting appalling performance compared to the old server. Which given everything has been upgraded is bizarre.

The system consists of an x86 server running RHEL8.4 and 8.12.1 To this server we have 10 old Sun J4400 storage shelves attached recycled from a previous HPC system. I am using five LSI 9206-16e SAS cards to hook them up and using dm-multipath and software RAID to create 23 RAID6 arrays with one disk per shelf. The 24th slot in each shelf is reserved for a hot spare. The server has dual Xeon E5-2637 v3 CPU's with 128GB of RAM. System disks are 600GB 15k RPM SAS in a RAID1 and the database is on a RAID1 of 1TB m.2 NVMe drives on a HighPoint SSD6202 card. More specifically /opt in on the NVMe drives and this is where the TSM instance is. This is all way snazzier than the old system which used the same J4400's but only single leg.

Each of the 23 RAID6 arrays has it's own file system mounted under /backup/diskXX. I then created an appropriate device class and preallocated sequential files to fill up the RAID6 arrays and the mount limit is set at 23. Performance on the old system was good and I was expecting this to be even better.

We have been slowly replacing the original 1TB SATA drives in the J4400's with larger SAS drives as we have run short on space for the backup of the HPC. However at the same time as upgrading the server we also came into a large supply of 4TB SAS drives for free (like over 200). I am now in the process of shuffling data around to free up the 1TB drives so I can install the 4TB drives in there place and create new RAID6 arrays.

This is where I have noticed that performance is bad. Basically I am getting maybe 10MB/s when I issue a "move data" command. However doing a cp from the OS of a file from one RAID6 array to another I am getting ~250MB/s. Previously on the old system which was slower everything and running on RHEL7 doing the same thing would peg the arrays.

If I issue lots of "move data" commands I can get the speed up but on closer examination it is impacting backup performance too. It's like there is some magic setting somewhere that is limiting the speed at which TSM can write to disk.

The only thing I can think where I might have made a mistake in hindsight is that I only specified a single file for the DB2 database. I was probably thinking that with such a fast disk and that it's all on a single disk what's the point of having multiple files. On the old system it was on a RAID1 of SATA SSD's on a 3Gbps SAS controller. That said watching the DB backups and the database disk goes full tilt at over 2000MB/s then stops as it gets squirted out from RAM onto the RAID6 arrays. The database looks to be about 74GB in size with the DB backups at nearly half that size.

Anyone any idea's as to what the issue might be? If it is the DB I can always scrub everything, and do a fresh install and restore from backup.
 

marclant

ADSM.ORG Moderator
Joined
Jun 16, 2006
Messages
3,735
Reaction score
618
Points
0
Location
Canada
Website
www.ibm.com
That's almost impossible to troubleshoot in a forum like this. The first step is identifying the bottleneck, once you know the bottleneck, then you can address it.

There's a lot of moving parts even for a simple task as a move data. The database needs to be queried and updated in addition to the actual data movement.

The only thing I can suggest is to follow the best practices for the DB disks:

And the best practices for a file storage pool:
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

DigitalOcean $100 Credit

Support ADSM.ORG and get DigitalOcean FREE credit. DigitalOcean currently offer a $100, 60-day Free Credit for new accounts. Sign-up here:

DigitalOcean Referral Badge

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 20 18.7%
  • Keep using TSM for Spectrum Protect.

    Votes: 65 60.7%
  • Let's be formal and just say Spectrum Protect

    Votes: 13 12.1%
  • Other (please comement)

    Votes: 9 8.4%

Forum statistics

Threads
31,871
Messages
135,906
Members
21,786
Latest member
london
Top