ADSM-L

Re: client session stopps with 'no space available in storage... and all successor pools'

2006-08-30 08:56:15
Subject: Re: client session stopps with 'no space available in storage... and all successor pools'
From: David le Blanc <david.leblanc AT IDENTITY-SOLUTIONS.COM DOT AU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 30 Aug 2006 22:51:21 +1000
I have a customer who was in the same situation.  I didn't believe it 
when IBM support described it as a race condition resulting in the FILE
storage
pool being unable to destage data to the Tape pool (as mentioned by
Richard Sims)

I would personally expect a full File storage pool to queue for a tape
mount
point, but it seems it will not queue for a mount point of its own
during
migration... (not sure if that makes sense) 

IBM suggested increasing the mountlimit to the maximum possible to
reduce the possibility
of this happening.  It did not stop the problem, but certainly reduced
the incidence
from a daily crisis to an occasional annoyance.

The final solution was to drop the file storage pool from the chain, and
make the
disk storage pool as big as it could be.

David

> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] 
> On Behalf Of Rainer Wolf
> Sent: Wednesday, 30 August 2006 10:21 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] client session stopps with 'no space 
> available in storage... and all successor pools'
> 
> David,
> the second is of type file and the mountlimit is set to 20
> in the file-devclass.
> The third and last Pool is a tapepool with a maximum of 4 
> drives available .
> 
>  From the help of...
> 
> UPDATE DEVCLASS -- FILE
> ...
> MOUNTLimit
>       Specifies the maximum number of files that can be 
> simultaneously open
>       for input/output. This parameter is optional. You can 
> specify a number
>       from 1 to 4096.
> 
>       If you plan to use the simultaneous write function, ensure that
>       sufficient drives are available for the write 
> operation. If the number
>       of drives needed for a simultaneous write operation is 
> greater than the
>       value of the MOUNTLIMIT parameter for a device class, 
> the transaction
>       will fail. For details about the simultaneous write 
> function, refer to
>       the Administrator's Guide.
> ...
> ... I don't understand whats about the 'drives' mentioned.
> So I'm confused now if I should increase the mountlimit to eg 40 ?
> or better decrease  ? ... to the number of the maximum 
> available drives of the
> tape-destination that comes after the file-pool ?
> 
> Cheers
> Rainer
> 
> David le Blanc schrieb:
> > I believe this can happen at 5.3 clients....
> >
> > Are any of your pools (in the chain of pools the client 
> writes to) of
> > type FILE ?
> >
> > Try increasing the number of mount points for the device 
> class for that
> > pool.
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]
> >>On Behalf Of Rainer Wolf
> >>Sent: Wednesday, 30 August 2006 7:34 PM
> >>To: ADSM-L AT VM.MARIST DOT EDU
> >>Subject: Re: [ADSM-L] client session stopps with 'no space
> >>available in storage... and all successor pools'
> >>
> >>Arnaud,
> >>no thats not hurting cause
> >>we have no cache enabled on the disk storage pools ...
> >>checked the other things
> >>of the tech-note and still nothing applies to this problem.
> >>Especially there are available scratch tapes and the number
> >>of the available tapes
> >>in the tapepool is high enough.
> >>think I look through the 'fixed things' of 5.3 Client
> >>... it seems to only happen at the 5.2 Clients .
> >>
> >>Cheers
> >>Rainer
> >>
> >>
> >>PAC Brion Arnaud schrieb:
> >>
> >>
> >>>Rainer,
> >>>
> >>>Just found this technote :
> >>>http://www-1.ibm.com/support/docview.wss?uid=swg21079391
> >>
> >>which refers to
> >>
> >>>ANS1311E and ANR0522W  problems, and states this possible reason :
> >>>
> >>>- Cache is enabled on the disk storage pool that the TSM
> >>
> >>Client backup
> >>
> >>>data is being sent to, and cached data cannot be deleted
> >>
> >>quickly enough
> >>
> >>>to allow the backup data to be written to the disk storage
> >>
> >>pool. In this
> >>
> >>>case, update the storage pool so that cache is not enabled.
> >>>
> >>>Couldn't this be hurting you ?
> >>>
> >>>Cheers
> >>>
> >>>
> >>>Arnaud
> >>>
> >>>
> >>
> >>**************************************************************
> >>**********
> >>
> >>>******
> >>>Panalpina Management Ltd., Basle, Switzerland,
> >>>CIT Department Viadukstrasse 42, P.O. Box 4002 Basel/CH
> >>>Phone:  +41 (61) 226 11 11, FAX: +41 (61) 226 17 01
> >>>Direct: +41 (61) 226 19 78
> >>>e-mail: arnaud.brion AT panalpina DOT com
> >>>
> >>
> >>**************************************************************
> >>**********
> >>
> >>>******
> >>>
> >>>-----Original Message-----
> >>>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]
> >>
> >>On Behalf Of
> >>
> >>>Rainer Wolf
> >>>Sent: Wednesday, 30 August, 2006 09:31
> >>>To: ADSM-L AT VM.MARIST DOT EDU
> >>>Subject: client session stopps with 'no space available in
> >>
> >>storage...
> >>
> >>>and all successor pools'
> >>>
> >>>Dear TSmers,
> >>>
> >>>this happens on tsm server 5.3.3.2 / solaris ,3494
> >>>
> >>>and Clients: linux86 5.2.3.1 , linux86 5.2.3.0 , solaris 5.2.5.0 ,
> >>>solaris 5.2.2.6 , winnt 5.2.3.11
> >>>
> >>>we have a strange problem with occasionly stopped client
> >>
> >>sessions with
> >>
> >>>the message 'no space available in storage pool BACKUPPOOL and all
> >>>successor pools' .
> >>>If this happens it happens withe clients running bigger 
> transfers in
> >>>time and data - mostly on initial backups.
> >>>The data flow is set up as
> >>>random access disk pool -->  sequential file pool -->
> >>
> >>sequential tape
> >>
> >>>pool
> >>>
> >>>It may happen that the first 2 stages are going to be full but the
> >>>tapepool always has free and usable scratch volumes available.
> >>>
> >>>The question is : is this a bug at the server or do I have 
> to change
> >>>something in the setups of the pools ?
> >>>The space in the random-access pools are normally migrating down to
> >>>about 50 % -- is it better to bring this down to 0% Usage 
> as a daily
> >>>task ?
> >>>
> >>>I thought that sessions that don't have enough space in the
> >>>backup/filepools would directly write to tape if it is needed.
> >>>But if this stopping happens it seems to be just happening
> >>
> >>on that long
> >>
> >>>and large running sessions starting to write on backuppool and then
> >>>switching to filepool ... it seems to be that there is no second
> >>>switching on the tapepool possible ?
> >>>
> >>>I just checked the client-versions of all nodes where this
> >>
> >>happens and
> >>
> >>>all of them have 5.2.X.X ... so is it just a client-problem
> >>
> >>with the old
> >>
> >>>5.2.X.X clients ?
> >>>
> >>>Thanks a lot in advance for any hints !
> >>>Rainer
> >>>
> >>>
> >>>tsm: TSM1>q actlog begint=-20 search=94090
> >>>
> >>>
> >>>Date/Time                Message
> >>>--------------------
> >>>----------------------------------------------------------
> >>>08/29/06   08:13:01      ANR0406I Session 94090 started for node
> >>>ULLI187.CHEMIE
> >>>                           (Linux86) (Tcp/Ip
> >>>134.60.42.187(1039)).(SESSION: 94090)
> >>>08/29/06   20:17:08      ANR8340I FILE volume
> >>>/tsmdata3/tsm1/file8/00006B4D.BFS
> >>>                           mounted.(SESSION: 94090)
> >>>08/29/06   20:17:08      ANR0511I Session 94090 opened 
> output volume
> >>>
> >>
> >>/tsmdata3/tsm1/file8/00006B4D.BFS.(SESSION:
> >>
> >>>94090)
> >>>08/29/06   20:17:24      ANR8341I End-of-volume reached for
> >>
> >>FILE volume
> >>
> >>>
> >>
> >>/tsmdata3/tsm1/file8/00006B4D.BFS.(SESSION:
> >>
> >>>94090)
> >>>08/29/06   20:17:24      ANR0514I Session 94090 closed volume
> >>>
> >>
> >>/tsmdata3/tsm1/file8/00006B4D.BFS.(SESSION:
> >>
> >>>94090)
> >>>08/29/06   20:17:24      ANR0522W Transaction failed for
> >>
> >>session 94090
> >>
> >>>for node
> >>>                           ULLI187.CHEMIE (Linux86) - no
> >>
> >>space available
> >>
> >>>in storage
> >>>                           pool BACKUPPOOL8 and all successor
> >>>pools.(SESSION: 94090)
> >>>08/29/06   20:17:53      ANR0403I Session 94090 ended for node
> >>>ULLI187.CHEMIE
> >>>                           (Linux86).(SESSION: 94090)
> >>>
> >>>
> >>>
> >>>tsm: TSM1>q actlog search=94086 begind=-2
> >>>
> >>>Date/Time                Message
> >>>--------------------
> >>>----------------------------------------------------------
> >>>08/29/06   08:10:22      ANR0406I Session 94086 started for node
> >>>ULLI187.CHEMIE
> >>>                           (Linux86) (Tcp/Ip
> >>>134.60.42.187(1038)).(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4952I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects inspected:
> >>>1,458,833(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4954I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects backed up:
> >>>1,457,166(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4958I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects updated:
> >>>0(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4960I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects rebound:
> >>>0(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4957I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects deleted:
> >>>0(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4970I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects expired:
> >>>0(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4959I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of objects failed:
> >>>1(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4961I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Total
> >>>                           number of bytes transferred: 245.28
> >>>GB(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4963I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Data
> >>>                           transfer time:                112,550.97
> >>>sec(SESSION:
> >>>                           94086)
> >>>08/29/06   20:17:54      ANE4966I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Network
> >>>                           data transfer rate:        2,285.17
> >>>KB/sec(SESSION:
> >>>                           94086)
> >>>08/29/06   20:17:54      ANE4967I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Aggregate
> >>>                           data transfer rate:      5,913.70
> >>>KB/sec(SESSION: 94086)
> >>>08/29/06   20:17:54      ANE4968I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Objects
> >>>                           compressed by:
> >>
> >> 0%(SESSION:
> >>
> >>>94086)
> >>>08/29/06   20:17:54      ANE4964I (Session: 94086, Node:
> >>
> >>ULLI187.CHEMIE)
> >>
> >>>Elapsed
> >>>                           processing time:
> >>
> >>12:04:52(SESSION:
> >>
> >>>94086)
> >>>08/29/06   20:48:39      ANR0482W Session 94086 for node
> >>
> >>ULLI187.CHEMIE
> >>
> >>>(Linux86)
> >>>                           terminated - idle for more than 30
> >>>minutes.(SESSION:
> >>>                           94086)
> >>>
> >>>The storage pools look like:
> >>>
> >>>Storage         Device          Estimated       Pct
> >>
> >>Pct     High
> >>
> >>>Low     Next Stora-
> >>>Pool Name       Class Name       Capacity      Util
> >>
> >>Migr      Mig
> >>
> >>>Mig     ge Pool
> >>>
> >>
> >>        Pct
> >>
> >>>Pct
> >>>-----------     ----------     ----------     -----
> >>
> >>-----     ----
> >>
> >>>---     -----------
> >>>BACKUPPOOL8     DISK                160 G      63.9
> >>
> >>56.2       90
> >>
> >>>50     FILEPOOL8
> >>>FILEPOOL8       FILE8               405 G      76.2
> >>
> >>77.5       90
> >>
> >>>70     TAPE_O262
> >>>TAPE_O262       3592             22,288 G      24.6
> >>
> >>60.0      100
> >>
> >>>70
> >>>
> >>>
> >>>
> >>>tsm: TSM1>q stg BACKUPPOOL8 f=d
> >>>
> >>>                Storage Pool Name: BACKUPPOOL8
> >>>                Storage Pool Type: Primary
> >>>                Device Class Name: DISK
> >>>               Estimated Capacity: 160 G
> >>>               Space Trigger Util: 63.9
> >>>                         Pct Util: 63.9
> >>>                         Pct Migr: 56.2
> >>>                      Pct Logical: 99.4
> >>>                     High Mig Pct: 90
> >>>                      Low Mig Pct: 50
> >>>                  Migration Delay: 0
> >>>               Migration Continue: Yes
> >>>              Migration Processes: 1
> >>>            Reclamation Processes:
> >>>                Next Storage Pool: FILEPOOL8
> >>>             Reclaim Storage Pool:
> >>>           Maximum Size Threshold: No Limit
> >>>                           Access: Read/Write
> >>>                      Description:
> >>>                Overflow Location:
> >>>            Cache Migrated Files?: No
> >>>                       Collocate?:
> >>>            Reclamation Threshold:
> >>>        Offsite Reclamation Limit:
> >>>  Maximum Scratch Volumes Allowed:
> >>>   Number of Scratch Volumes Used:
> >>>    Delay Period for Volume Reuse:
> >>>           Migration in Progress?: No
> >>>             Amount Migrated (MB): 176,075.38 Elapsed 
> Migration Time
> >>>(seconds): 16,556
> >>>         Reclamation in Progress?:
> >>>   Last Update by (administrator): xx
> >>>            Last Update Date/Time: 08/29/06   17:00:14
> >>>         Storage Pool Data Format: Native
> >>>             Copy Storage Pool(s):
> >>>          Continue Copy on Error?:
> >>>                         CRC Data: No
> >>>                 Reclamation Type:
> >>>
> >>>
> >>>tsm: TSM1>
> >>>
> >>>tsm: TSM1>q stg filePOOL8 f=d
> >>>
> >>>                Storage Pool Name: FILEPOOL8
> >>>                Storage Pool Type: Primary
> >>>                Device Class Name: FILE8
> >>>               Estimated Capacity: 405 G
> >>>               Space Trigger Util: 98.4
> >>>                         Pct Util: 76.2
> >>>                         Pct Migr: 77.5
> >>>                      Pct Logical: 99.9
> >>>                     High Mig Pct: 90
> >>>                      Low Mig Pct: 70
> >>>                  Migration Delay: 0
> >>>               Migration Continue: Yes
> >>>              Migration Processes: 1
> >>>            Reclamation Processes: 1
> >>>                Next Storage Pool: TAPE_O262
> >>>             Reclaim Storage Pool:
> >>>           Maximum Size Threshold: No Limit
> >>>                           Access: Read/Write
> >>>                      Description:
> >>>                Overflow Location:
> >>>            Cache Migrated Files?:
> >>>                       Collocate?: Group
> >>>            Reclamation Threshold: 100
> >>>        Offsite Reclamation Limit:
> >>>  Maximum Scratch Volumes Allowed: 200
> >>>   Number of Scratch Volumes Used: 155
> >>>    Delay Period for Volume Reuse: 1 Day(s)
> >>>           Migration in Progress?: No
> >>>             Amount Migrated (MB): 91,059.84 Elapsed Migration Time
> >>>(seconds): 2,587
> >>>         Reclamation in Progress?: No
> >>>   Last Update by (administrator): xx
> >>>            Last Update Date/Time: 08/30/06   08:08:01
> >>>         Storage Pool Data Format: Native
> >>>             Copy Storage Pool(s):
> >>>          Continue Copy on Error?:
> >>>                         CRC Data: No
> >>>                 Reclamation Type: Threshold
> >>>
> >>>
> >>>tsm: TSM1>q stg TAPE_O262 f=d
> >>>
> >>>                Storage Pool Name: TAPE_O262
> >>>                Storage Pool Type: Primary
> >>>                Device Class Name: 3592
> >>>               Estimated Capacity: 22,288 G
> >>>               Space Trigger Util:
> >>>                         Pct Util: 24.6
> >>>                         Pct Migr: 60.0
> >>>                      Pct Logical: 97.1
> >>>                     High Mig Pct: 100
> >>>                      Low Mig Pct: 70
> >>>                  Migration Delay: 0
> >>>               Migration Continue: Yes
> >>>              Migration Processes: 1
> >>>            Reclamation Processes: 1
> >>>                Next Storage Pool:
> >>>             Reclaim Storage Pool:
> >>>           Maximum Size Threshold: No Limit
> >>>                           Access: Read/Write
> >>>                      Description:
> >>>                Overflow Location:
> >>>            Cache Migrated Files?:
> >>>                       Collocate?: Group
> >>>            Reclamation Threshold: 100
> >>>        Offsite Reclamation Limit:
> >>>  Maximum Scratch Volumes Allowed: 50
> >>>   Number of Scratch Volumes Used: 32
> >>>    Delay Period for Volume Reuse: 8 Day(s)
> >>>           Migration in Progress?: No
> >>>             Amount Migrated (MB): 0.00
> >>>Elapsed Migration Time (seconds): 0
> >>>         Reclamation in Progress?: No
> >>>   Last Update by (administrator): xx
> >>>            Last Update Date/Time: 08/29/06   16:33:34
> >>>         Storage Pool Data Format: Native
> >>>             Copy Storage Pool(s):
> >>>          Continue Copy on Error?:
> >>>                         CRC Data: No
> >>>                 Reclamation Type: Threshold
> >>>
> >>>
> >>>
> >>>
> >>>--
> >>>
> >>
> >>--------------------------------------------------------------
> >>----------
> >>
> >>>Rainer Wolf                          eMail:
> >>
> >>rainer.wolf AT uni-ulm DOT de
> >>
> >>>kiz - Abt. Infrastruktur           Tel/Fax:      ++49 731
> >>
> >>50-22482/22471
> >>
> >>>Universitaet Ulm                     wwweb:
> >>
> >>http://kiz.uni-ulm.de
> >>
> >>>
> >>--
> >>--------------------------------------------------------------
> >>----------
> >>Rainer Wolf                          eMail:
> >>rainer.wolf AT uni-ulm DOT de
> >>kiz - Abt. Infrastruktur           Tel/Fax:      ++49 731
> >>50-22482/22471
> >>Universitaet Ulm                     wwweb:
> >>http://kiz.uni-ulm.de
> >>
> >
> >
> >
> 
> --
> --------------------------------------------------------------
> ----------
> Rainer Wolf                          eMail:       
> rainer.wolf AT uni-ulm DOT de
> kiz - Abt. Infrastruktur           Tel/Fax:      ++49 731 
> 50-22482/22471
> Universitaet Ulm                     wwweb:        
> http://kiz.uni-ulm.de
>