Howdy, all.
I'm giving my offsites a little bit of a workout, and am trying to
identify a bottleneck in the remote-volume access path. I'm hoping
someone else has messed with this too.
My offsites live in a machine room 300-some miles from my main site.
This led to a variety of TCP tuning experiments as I tried to get it
right. After setting the TCP windows to ~2M, I get essentially local
performance out of my tape drives (peaks at ~100MB/s, reading from
Gainesville tape and writing to Atlanta tape). I also get good speed
on the way back.
But when I restore from one of those copystg volumes, my throughput is
about 2.5MB/s, which is suspiciously close to the throughput I was
getting before I tuned the TCP window.
So I've been doing some experiments. I can get a client to backup and
restore directly to Atlanta at 16-20MB/s, but if I insert the local
TSM server I get no change, even though each individual leg goes MUCH
faster by itself.
I'm thinking I've got a TSM protocol-level analogue of the TCP-level
window problem: I can only have so much data in-flight before someone
wants an ACK, which limits the total throughput. But I think it's in
the TSM-level command stream.
I've dodged questions of file count: I would understand it if objects
were moving faster than DB commits could happen, but my current test
case is a single ~1G file.
Now, TCPBUF at the server level would seem tempting as a knob, but
that's too small (32K documented max) and specifically disavows
relationship with TCPWindow. No other options look suggestive.
TCPBuff at the client level doesn't document up to 2M, but when I
moved it from default value to 512K, I saw zero difference in speed,
so I don't think that's it.
Ideally, I should be able to restore from the offsite datastore with
only the interference of non-collocated, tiny volumes (as if that's
not plenty). It'd be nice if at least the transfer speed were
better.
Any insight, experience, whatever?
- Allen S. Rout
|