ADSM-L

Re: restore speed question

2005-12-22 09:09:27
Subject: Re: restore speed question
From: Sung Y Lee <sunglee AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 22 Dec 2005 09:07:23 -0500
> Any other ideas?

To isolate a possible networking issue, in the past, I have ftped some
files between the TSM server<--->Client just to verify network is cool.

Sung Y. Lee

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 12/22/2005
08:09:23 AM:

> Thanks for the responses all, but it's not a tape mounting issue. I
wasn't
> clear enough in my original post, but I am watching the actlog while the
> restore is taking place, and I'm sitting next to the library, so I can
> tell when it's doing anything: remounting, rewinding, etc. What I'm
> saying is this:
>
> The server is restoring a single 32GB file, and starts doing so at
> 30+MB/sec. At some point, DURING the restore of that SAME 32GB file, the
> server suddenly slows down the restore, to 200-300K/sec. The server has
> NOT switched tapes, and is NOT rewinding even the SAME tape. It is still
> restoring that same 32GB file, but suddenly does so at a slower speed.
>
> I know the drives have some kind of burst speed and normal speed. Maybe
> something is wacked out with that function?
>
> Any other ideas?
>
> Alex
>
> On Thu, 22 Dec 2005, Leigh Reed wrote:
>
> > Alex
> >
> > I hate restores that don't go as fast as I want them to, especially
when
> > it's 3 o'clock in the morning, so I'll have a stab at what might be
> > wrong. The nature of your problem does seem very intermittent and the
> > fact that some times you do achieve an acceptable speed makes it
> > difficult.
> >
> > Firstly, I think you need to know what primary pool tapes your data is
> > across. As Troy mentioned, if you are not collocating (or collocating
by
> > group), then the data is going to be spread across a large number of
> > tapes. Even if you are collocating (all data on one tape), remember
that
> > you are restoring the active data only, the tape will contain all the
> > previous and deleted versions (depending upon your backup copy group
> > parameters). During the restore, the tape will have to skip between
> > these; while this is happening, your aggregate network performance will
> > decrease, as nothing is being restored.
> >
> > The following command will list the primary volumes that the node data
> > is across
> >
> > select volume_name from volumeusage where node_name='xxxxxxx' and
> > copy_type='backup' and stgpool_name='PRIMARY_TAPE_POOL' group by
> > volume_name
> >
> > If this returns a large number of tapes, then you have 2 options
> > available to you. Use a 'multi-thread' restore, by increasing the
> > resourceutilization setting in the client dsm.opt file and also
> > increasing the MAXNUMMP parameter. This will enable you to restore
> > multiple tapes concurrently (depending on the number of drives that you
> > have available). Please note that multi-threading only works with No
> > Query Restores.
> >
> > The second option is as Troy alluded to with a MOVE NODEDATA, but if
> > memory servers me right, the elusive 'Active only' switch is still not
> > available, therefore the tape restore will still have to skip through
> > the data that is not active.
> >
> > If all of the above is completely evident to you, then we are back to
> > the old favourite; try FTP'ing a large directory of files from the TSM
> > server to the target restore server, this should test out your network
> > and filesystem performance.
> >
> > The only other suggestion would be to take a look at what your TSM
> > server is doing at the time of the restore.
> > - are you doing the restore at night when a large number of backups are
> > occurring
> > - is expiration running at the time of the restore
> > - during the restore, keep issuing 'q sess' commands and see if the
> > restore is 'clocking up' recw, sendw, commw time.
> >
> > One other thing I have just remembered, if you are doing a full BMR and
> > you have restored the OS first and rebooted, your restored OS may have
> > virus scanning enabled and if it is set to scan on write, when you
> > restore the remaining drive(s), every file will be scanned before it is
> > written, this will definitely slow down your restore. Task manager
> > should show the virus scanner chewing up CPU.
> >
> > HTH
> > Merry Xmas One and All.
> > Leigh
> >
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
> > Alexander Lazarevich
> > Sent: 21 December 2005 19:47
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: [ADSM-L] restore speed question
> >
> > TSM server 5.3.1 on 2K server. Libraries are one Overland Neo 4100 with
> > 2
> > LTO2 drives, and an Overland Neo 4100 with 2 LTO3 drives.
> >
> > I'm restoring a windows client workspace. Client is running TSM backup
> > client version 5.3.0. Originally, the client was 5.1.9.0, and it was
> > with
> > this version that we first created the backup of the workspace drive on
> > the client.
> >
> > Now I'm trying to restore that workspace filespace to the new system.
> > The
> > restore started fine, 30MB+/sec in our GigE network. But at times the
> > restore speed slows to a halt, and restore speeds are less than
1MB/sec,
> > sometimes only 200-300K/sec. Then a little later it will start to go
> > 30MB/sec again. It is switching back and forth, sometimes in the middle
> > of
> > a file! The server is currently trying to restore an 86MB file, but
it's
> > doing so only at 300K/sec. Being that the workspace is 244GB, this is
> > unacceptable speed.
> >
> > The client data is on the LTO2 library. There is absolutely nothing
> > unusal in the logs that would indicate any kind of problem on the drive
> > or
> > the tape. No errors are being reported whatsoever.
> >
> > The client filespace is NOT compressed on the client side. Compression
> > happens on the drives (HP).
> >
> > The client hardware is excellent, dual AMD opteron, with 3G SATA drives
> > in
> > striped RAID, XP Pro. Plus the client at times restores at 30+MB/sec so
> > I
> > know it can do it.
> >
> > Network on the client is not busy, and the network switch is not
> > saturated, in fact there is very little network activity.
> >
> > It just seems like the server decides to go fast at times and then
> > sometimes very slow. But with nothing in the logs I don't know what to
> > troubleshoot. Any idea where to start troubleshooting this problem?
> > Anyone
> > seen this type of behavior before?
> >
> > Thanks!
> >
> > Alex
> >

<Prev in Thread] Current Thread [Next in Thread>