Hi,
I have done quite similar restores on our mailserver.
you may also look at the Client what happens to the
restore-process. It may happen that the cpu is at 100 % for
the 'dsmc restore ..' ? Another thing is the filesystem on the
Client and you may check the filesystem/disk-activity/Service-time if there
is any 'weakness' that may result from creating that many i-nodes.
I have recently done a lot of mailserver-restores (always 3,5 mio Files/140 GB
)
using an old tsm-server ( v5.1.9.5 with k-tapes and same konfig like you ...
10 tapes )
and observed that specially this old tsm-server was at the end.
Especially our io-konfiguration of that old tsm-server was very bad :
db,log, disk-cache are mixed up. This decreases the restore-performance
especially
when other activity ( backups at night ) happens.
So we used
dsmc restore -quiet /mail/ /data2/mail/
(tcpwindowsize 64, tcpbuffsize 32, largecommbuffers no, txnbytelimit 25600
resourceutilization 3)
and received the 3,5 mio Files/140 GB finally in 09:53:34
For me that was ok because I know about the bad server-constitution.
The restore time would be much more worse if the restore comes into a time
when the tsm-DB got a lot of other transactions - like nighly backups.
... restoring the same with only one drive results in 51 hours .
Running the same mail-restore test on a new hardware ( new db, tsm5.3, with
3592 Drives )
--using the same restore-client--- we finally got 3.5mio Files/150GB restored
in 04:52:00
... using just 1 drive because the data fits on 1 3599-tape.
But here I have experienced a reproduceable bug/behaviour ( it is in the moment
'closed' because
the solaris10 is not yet supported ) : when starting the restore everything
runs fine and
fast ( with a restore-performance at about 1 mio Files/hour ) ... after some
time -maybe 40 %
of the total restore time- the cpu of the client is raising to 100 % and the
restore performance ( data/files) is thus slowing down -- there is no reason
for this found at the server
or at the client.
... maybe it happens when a very big directory with a lot of directory in it
is in progress ...
In the end I found a 'workaround': I canceled this slowed-down restore-process
running at 100%CPU
( 'dsmc restore -quiet /mail/ /data2/mail/' )
with Control-C, and let him shut down ... and then I just restart the restore
with
'dsmc restart restore -quiet' . This 'restarted restore' works fast again and
finally
ends with the 04:52:00 (total time).
If I would not stop/restart the client-restore-session the restore will
end restoring with 06:49:09 .
That is reproduceable and it is a quite big difference
( 30 % faster with interrupting and restarting )
but maybe its because of our unsupported tsm-version
... or has someone else seen this "cpu-crunching" behaviour ?
Greetings
Rainer
Thomas Denier wrote:
>
> We recently restored a large mail server. We restored about nine million
> files with a total size of about ninety gigabytes. These were read from
> nine 3490 K tapes. The node we were restoring is the only node using the
> storage pool involved. We ran three parallel streams. The restore took
> just over 24 hours.
>
> The client is Intel Linux with 5.2.3.0 client code. The server is mainframe
> Linux with 5.2.2.0 server code.
>
> 'Query session' commands run during the restore showed the sessions in 'Run'
> status most of the time. Accounting records reported the sessions in media
> wait most of the time. We think most of this time was spent waiting for
> movement of tape within a drive, not waiting for tape mounts.
>
> Our analysis has so far turned up only two obvious problems: the
> movebatchsize and movesizethreshold options were smaller than IBM
> recommends. On the face of it, these options affect server housekeeping
> operations rather than restores. Could these options have any sort of
> indirect impact on restore performance? For example, one of my co-workers
> speculated that the option values might be forcing migration to write
> smaller blocks on tape, and that the restore performance might be
> degraded by reading a larger number of blocks.
>
> We are thinking of running a test restore with tracing enabled on the
> client, the server, or both. Which trace classes are likely to be
> informative without adding too much overhead? We are particularly
> interested in information on the server side. The IBM documentation for
> most of the server trace classes seems to be limited to the names of the
> trace classes.
--
------------------------------------------------------------------------
Rainer Wolf eMail: rainer.wolf AT uni-ulm DOT de
kiz - Abt. Infrastruktur Tel/Fax: ++49 731 50-22482/22471
Universität Ulm wwweb: http://kiz.uni-ulm.de
|