Appalling FILE dev class performance

jabuzzard

ADSM.ORG Member
Joined
Dec 12, 2008
Messages
11
Reaction score
0
Points
0
I have recently upgraded our server and I am getting appalling performance compared to the old server. Which given everything has been upgraded is bizarre.

The system consists of an x86 server running RHEL8.4 and 8.12.1 To this server we have 10 old Sun J4400 storage shelves attached recycled from a previous HPC system. I am using five LSI 9206-16e SAS cards to hook them up and using dm-multipath and software RAID to create 23 RAID6 arrays with one disk per shelf. The 24th slot in each shelf is reserved for a hot spare. The server has dual Xeon E5-2637 v3 CPU's with 128GB of RAM. System disks are 600GB 15k RPM SAS in a RAID1 and the database is on a RAID1 of 1TB m.2 NVMe drives on a HighPoint SSD6202 card. More specifically /opt in on the NVMe drives and this is where the TSM instance is. This is all way snazzier than the old system which used the same J4400's but only single leg.

Each of the 23 RAID6 arrays has it's own file system mounted under /backup/diskXX. I then created an appropriate device class and preallocated sequential files to fill up the RAID6 arrays and the mount limit is set at 23. Performance on the old system was good and I was expecting this to be even better.

We have been slowly replacing the original 1TB SATA drives in the J4400's with larger SAS drives as we have run short on space for the backup of the HPC. However at the same time as upgrading the server we also came into a large supply of 4TB SAS drives for free (like over 200). I am now in the process of shuffling data around to free up the 1TB drives so I can install the 4TB drives in there place and create new RAID6 arrays.

This is where I have noticed that performance is bad. Basically I am getting maybe 10MB/s when I issue a "move data" command. However doing a cp from the OS of a file from one RAID6 array to another I am getting ~250MB/s. Previously on the old system which was slower everything and running on RHEL7 doing the same thing would peg the arrays.

If I issue lots of "move data" commands I can get the speed up but on closer examination it is impacting backup performance too. It's like there is some magic setting somewhere that is limiting the speed at which TSM can write to disk.

The only thing I can think where I might have made a mistake in hindsight is that I only specified a single file for the DB2 database. I was probably thinking that with such a fast disk and that it's all on a single disk what's the point of having multiple files. On the old system it was on a RAID1 of SATA SSD's on a 3Gbps SAS controller. That said watching the DB backups and the database disk goes full tilt at over 2000MB/s then stops as it gets squirted out from RAM onto the RAID6 arrays. The database looks to be about 74GB in size with the DB backups at nearly half that size.

Anyone any idea's as to what the issue might be? If it is the DB I can always scrub everything, and do a fresh install and restore from backup.
 
That's almost impossible to troubleshoot in a forum like this. The first step is identifying the bottleneck, once you know the bottleneck, then you can address it.

There's a lot of moving parts even for a simple task as a move data. The database needs to be queried and updated in addition to the actual data movement.

The only thing I can suggest is to follow the best practices for the DB disks:

And the best practices for a file storage pool:
 
Back
Top