Throughput of Restorations from remote copystgpools?

Howdy, all.

I'm giving my offsites a little bit of a workout, and am trying to
identify a bottleneck in the remote-volume access path.  I'm hoping
someone else has messed with this too.

My offsites live in a machine room 300-some miles from my main site.
This led to a variety of TCP tuning experiments as I tried to get it
right.  After setting the TCP windows to ~2M, I get essentially local
performance out of my tape drives (peaks at ~100MB/s, reading from
Gainesville tape and writing to Atlanta tape).  I also get good speed
on the way back.


But when I restore from one of those copystg volumes, my throughput is
about 2.5MB/s, which is suspiciously close to the throughput I was
getting before I tuned the TCP window.

So I've been doing some experiments.  I can get a client to backup and
restore directly to Atlanta at 16-20MB/s, but if I insert the local
TSM server I get no change, even though each individual leg goes MUCH
faster by itself.

I'm thinking I've got a TSM protocol-level analogue of the TCP-level
window problem: I can only have so much data in-flight before someone
wants an ACK, which limits the total throughput.  But I think it's in
the TSM-level command stream.

I've dodged questions of file count: I would understand it if objects
were moving faster than DB commits could happen, but my current test
case is a single ~1G file.


Now, TCPBUF at the server level would seem tempting as a knob, but
that's too small (32K documented max) and specifically disavows
relationship with TCPWindow.  No other options look suggestive.
TCPBuff at the client level doesn't document up to 2M, but when I
moved it from default value to 512K, I saw zero difference in speed,
so I don't think that's it.


Ideally, I should be able to restore from the offsite datastore with
only the interference of non-collocated, tiny volumes (as if that's
not plenty).  It'd be nice if at least the transfer speed were
better.

Any insight, experience, whatever?


- Allen S. Rout