ADSM-L

Restore performance problem

2005-03-31 16:32:54
Subject: Restore performance problem
From: Thomas Denier <Thomas.Denier AT MAIL.TJU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 31 Mar 2005 16:32:30 -0500
We recently restored a large mail server. We restored about nine million
files with a total size of about ninety gigabytes. These were read from
nine 3490 K tapes. The node we were restoring is the only node using the
storage pool involved. We ran three parallel streams. The restore took
just over 24 hours.

The client is Intel Linux with 5.2.3.0 client code. The server is mainframe
Linux with 5.2.2.0 server code.

'Query session' commands run during the restore showed the sessions in 'Run'
status most of the time. Accounting records reported the sessions in media
wait most of the time. We think most of this time was spent waiting for
movement of tape within a drive, not waiting for tape mounts.

Our analysis has so far turned up only two obvious problems: the
movebatchsize and movesizethreshold options were smaller than IBM
recommends. On the face of it, these options affect server housekeeping
operations rather than restores. Could these options have any sort of
indirect impact on restore performance? For example, one of my co-workers
speculated that the option values might be forcing migration to write
smaller blocks on tape, and that the restore performance might be
degraded by reading a larger number of blocks.

We are thinking of running a test restore with tracing enabled on the
client, the server, or both. Which trace classes are likely to be
informative without adding too much overhead? We are particularly
interested in information on the server side. The IBM documentation for
most of the server trace classes seems to be limited to the names of the
trace classes.

<Prev in Thread] Current Thread [Next in Thread>
  • Restore performance problem, Thomas Denier <=