ADSM-L

Re: Very slow restores (days), hours to locate files

2005-07-07 16:39:56
Subject: Re: Very slow restores (days), hours to locate files
From: "Connor, Jeffrey P." <Jeffrey.Connor AT US.NGRID DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 7 Jul 2005 16:39:40 -0400
Robin,

I hope the LTO firmware resolves your problem.  However, I have seen a
similar situation for Windows clients in our shop and it was not a tape
drive issue. The situation here was that we had a tape stgpool, 3590Ks
/3590E1A drives, collocated by node, that reached its maxscratch value.
This led to what some folks call imperfect collocation where even though
the stgpool is setup to collocate by node, data for more than one node
can end up on the same tape.  The problem we had with the node intermix
in a collocated by node pool showed itself with a situation that sounds
similar to yours.

We attempted to run a restore of an 8GB Win2k C: drive about 50% full
and saw very long delays where nothing appeared to be happening.  A tape
change would occur, some data would transfer, and then a VERY long pause
before a mount request for the next tape.  Query Session while tape was
mounted showed what your Q SE showed below, session in Run state, zero
seconds wait time, but send and recv byte counts remain unchanged. While
not as many but similar to you, our incremental backups of the servers
C: drive had files spread around a number of tapes.  We never determined
the root cause of the "think" time between tape mount requests. We
resolved the issue by moving tapes in our stgpool to a new pool with a
high MAXSCR value effectively re-collocating the data.  All restores ran
very happy after that.     

Sorry I could not provide a root cause of our situation but that's how
we addressed it. Just curious but do the 310 tapes you identified also
contain data for other nodes and are you using collocation?

Jeff Connor
National Grid USA


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Robin Sharpe
Sent: Wednesday, July 06, 2005 10:10 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Very slow restores (days), hours to locate files


Hi guys,

We're having problems restoring some windows servers (W2K)... The
servers in question had some disk problems and are being rebuilt, so the
Windows admins are restoring the C: drive.  It is an 8GB drive and less
than 50% used, so only 4GB to restore.  It has taken several days to
restore.  I know one of our problems is that the data is spread over
hundreds of volumes (literally... I counted 310 from a volumeusage
query). Another problem is that we have an overflowed library, but we
have loaded all of the tapes from the Windows storage pool.  What I
don't understand is why it takes so long to locate a file once the tape
is mounted.  We have seen the same tape mounted for hours before any
data is transferred.  Here is an excerpt from a "q se f=d" of a restore
that is running right now:

               Sess Number: 1,143
              Comm. Method: TCP/IP
                Sess State: Run
                 Wait Time: 0 S
                Bytes Sent: 670.9 M
               Bytes Recvd: 58.2 K
                 Sess Type: Node
                  Platform: WinNT
               Client Name: WANO01
       Media Access Status: Current input volume(s):  200658,(2279
Seconds)
                 User Name:
 Date/Time First Data Sent:
    Proxy By Storage Agent:

This restore has been running for almost 12 hours now (they have been
restarting them periodically).  There has been NO DATA transferred from
that tape in the 38 minutes it has been mounted... I know this from
doing an lsof command and looking at the offset which indicates the
number of bytes transferred.

I know that when I restore a single file, it can be found within seconds
of mounting a tape (these are all LTO-2)... so, why does it take so long
in this case?  Is TSM actually reading the entire tape?  If so, wouldn't
I see lots of data being transferred?  Or is there some kind of SCSI
command that allows the drive to read and compare the data it gets?  I
thought TSM stored actual locations of the files in the DB, so it could
quickly find any file (or aggregate) without reading the whole tape...
I've been searching the literature, and I can't find any details on
this.

The TSM server is on HP-UX 11i, IBM LTO-2 drives, fiber attached, in a
STK L700 library.  Also, my DB is huge (314GB), and we are currently
(for the last year) unable to delete anything, so we have many versions
of volatile files.  We are planning to split our environment into
several TSMs, and in the short term, our windows admins will start doing
weekly selective backups of the C: drives to consolidate active versions
on few tapes.

Thanks for any thoughts on this....

Robin Sharpe
Berlex Labs


This e-mail and any files transmitted with it, are confidential to National 
Grid and are intended solely for the use of the individual or entity to whom 
they are addressed.  If you have received this e-mail in error, please reply to 
this message and let the sender know.