ADSM-L

Re: restore speed question

2005-12-22 08:09:33
Subject: Re: restore speed question
From: Alexander Lazarevich <alazarev AT ITG.UIUC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 22 Dec 2005 07:09:23 -0600
Thanks for the responses all, but it's not a tape mounting issue. I wasn't
clear enough in my original post, but I am watching the actlog while the
restore is taking place, and I'm sitting next to the library, so I can
tell when it's doing anything: remounting, rewinding, etc. What I'm
saying is this:

The server is restoring a single 32GB file, and starts doing so at
30+MB/sec. At some point, DURING the restore of that SAME 32GB file, the
server suddenly slows down the restore, to 200-300K/sec. The server has
NOT switched tapes, and is NOT rewinding even the SAME tape. It is still
restoring that same 32GB file, but suddenly does so at a slower speed.

I know the drives have some kind of burst speed and normal speed. Maybe
something is wacked out with that function?

Any other ideas?

Alex

On Thu, 22 Dec 2005, Leigh Reed wrote:

Alex

I hate restores that don't go as fast as I want them to, especially when
it's 3 o'clock in the morning, so I'll have a stab at what might be
wrong. The nature of your problem does seem very intermittent and the
fact that some times you do achieve an acceptable speed makes it
difficult.

Firstly, I think you need to know what primary pool tapes your data is
across. As Troy mentioned, if you are not collocating (or collocating by
group), then the data is going to be spread across a large number of
tapes. Even if you are collocating (all data on one tape), remember that
you are restoring the active data only, the tape will contain all the
previous and deleted versions (depending upon your backup copy group
parameters). During the restore, the tape will have to skip between
these; while this is happening, your aggregate network performance will
decrease, as nothing is being restored.

The following command will list the primary volumes that the node data
is across

select volume_name from volumeusage where node_name='xxxxxxx' and
copy_type='backup' and stgpool_name='PRIMARY_TAPE_POOL' group by
volume_name

If this returns a large number of tapes, then you have 2 options
available to you. Use a 'multi-thread' restore, by increasing the
resourceutilization setting in the client dsm.opt file and also
increasing the MAXNUMMP parameter. This will enable you to restore
multiple tapes concurrently (depending on the number of drives that you
have available). Please note that multi-threading only works with No
Query Restores.

The second option is as Troy alluded to with a MOVE NODEDATA, but if
memory servers me right, the elusive 'Active only' switch is still not
available, therefore the tape restore will still have to skip through
the data that is not active.

If all of the above is completely evident to you, then we are back to
the old favourite; try FTP'ing a large directory of files from the TSM
server to the target restore server, this should test out your network
and filesystem performance.

The only other suggestion would be to take a look at what your TSM
server is doing at the time of the restore.
- are you doing the restore at night when a large number of backups are
occurring
- is expiration running at the time of the restore
- during the restore, keep issuing 'q sess' commands and see if the
restore is 'clocking up' recw, sendw, commw time.

One other thing I have just remembered, if you are doing a full BMR and
you have restored the OS first and rebooted, your restored OS may have
virus scanning enabled and if it is set to scan on write, when you
restore the remaining drive(s), every file will be scanned before it is
written, this will definitely slow down your restore. Task manager
should show the virus scanner chewing up CPU.

HTH
Merry Xmas One and All.
Leigh


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Alexander Lazarevich
Sent: 21 December 2005 19:47
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] restore speed question

TSM server 5.3.1 on 2K server. Libraries are one Overland Neo 4100 with
2
LTO2 drives, and an Overland Neo 4100 with 2 LTO3 drives.

I'm restoring a windows client workspace. Client is running TSM backup
client version 5.3.0. Originally, the client was 5.1.9.0, and it was
with
this version that we first created the backup of the workspace drive on
the client.

Now I'm trying to restore that workspace filespace to the new system.
The
restore started fine, 30MB+/sec in our GigE network. But at times the
restore speed slows to a halt, and restore speeds are less than 1MB/sec,
sometimes only 200-300K/sec. Then a little later it will start to go
30MB/sec again. It is switching back and forth, sometimes in the middle
of
a file! The server is currently trying to restore an 86MB file, but it's
doing so only at 300K/sec. Being that the workspace is 244GB, this is
unacceptable speed.

The client data is on the LTO2 library. There is absolutely nothing
unusal in the logs that would indicate any kind of problem on the drive
or
the tape. No errors are being reported whatsoever.

The client filespace is NOT compressed on the client side. Compression
happens on the drives (HP).

The client hardware is excellent, dual AMD opteron, with 3G SATA drives
in
striped RAID, XP Pro. Plus the client at times restores at 30+MB/sec so
I
know it can do it.

Network on the client is not busy, and the network switch is not
saturated, in fact there is very little network activity.

It just seems like the server decides to go fast at times and then
sometimes very slow. But with nothing in the logs I don't know what to
troubleshoot. Any idea where to start troubleshooting this problem?
Anyone
seen this type of behavior before?

Thanks!

Alex


<Prev in Thread] Current Thread [Next in Thread>