AIX client v. 18.104.22.168
AIX server v. 22.214.171.124
AIX version 126.96.36.199
We have a customer who is experiencing discrepancies in performance
between large-file cold backups of Oracle data and LV image backups, both
sent over the SAN to LTO-2 tape drives. On the lv image backups, they are
getting about 70 MB/s sustained. For the B/A client backups, they are
getting less than 30 MB/s.
Both backups are made using the same server stanza, and data is sent to the
same stgpool on the server, so the TSM configuration should be
identical. (See below.) In investigating this issue, the customer
determined that the block size used when reading the data for the image
backup was 256K, where the b/a client backup was about half of that. This
is what they reported:
the size of disk IO for the Oracle filesystems was close to 128k,
whereas the size of disk IO of the image backup was fixed at 256k.
Also, the image backup had a lot less disk seek than the
regular Oracle cold backup through SAN.
The customer believes that the read block size is contributing to, if not
entirely responsible for, the performance discrepancies. I would expect
that the image backup would have less disk seek, since it is processing the
entire LV, but the discrepancy in read block size is puzzling.
Other systems that we have tested with similar data have achieved equally
high throughput for the LV image backup and the b/a client backup of large
files, so it's not clear why this environment (mission-critical production
of course) would get such dramatically different throughput.
I have a PMR open with IBM regarding this issue. Is anyone aware of a way
to control the block size used for reads by the TSM client?
I'd like to do some client-side tracing to see if we can turn up the reason
for the performance differences, but I would like to be selective about the
tracing, so as to not unduly impact the client machine. Any suggestions
regarding what traceflags would be best to use?
If anyone has any suggestions regarding how to approach this, I'm all ears.
ERRORLOGR 30 D
SLAXDLOGR 30 D
DOMAIN.IMAGE /dev/redo1lv /dev/redo2lv /dev/redo3lv /dev/redo4lv