ADSM-L

Re: client session stopps with 'no space available in storage... and all successor pools'

2006-08-30 09:05:00
Subject: Re: client session stopps with 'no space available in storage... and all successor pools'
From: David le Blanc <david.leblanc AT IDENTITY-SOLUTIONS.COM DOT AU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 30 Aug 2006 23:00:07 +1000
There is a similar problem which is a combination of CACHE on a disk
storage
pool, COMPRESSION, and uncompressible files.

If tsm client wants to back up a 500MB file, it will estimate a
compression rate,
say 50% for the sake of argument, and requisition 250MB from the first
storage pool.
If unsuccessful, it will go to the next storage pool, and so on.

With caching turned on, it seems that TSM will gladly flush cached data
until the
requested capacity is available in the storage pool, and carry on.

During the backup, the TSM client finds the file is not compressing as
well as it
had anticipated, and requests more from TSM.  TSM, it seems, will only
return the
required space if it can do so immediately, without flushing additional
cached data.
If it would require data to be flushed to complete the request, it
returns failed, and
the client backup fails.  Ironically, without caching, if TSM was
genuinely unable to 
provide the extended space, it would still return an error, but the
client would retry
and maybe go to the next storage pool on the retry.  I don't pretend to
understand
whats going on at that point.

As you said, you are not using caching, and this is one of the many
reasons I always
tell people to turn it off.


> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] 
> On Behalf Of Rainer Wolf
> Sent: Wednesday, 30 August 2006 7:34 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] client session stopps with 'no space 
> available in storage... and all successor pools'
> 
> Arnaud,
> no thats not hurting cause
> we have no cache enabled on the disk storage pools ... 
> checked the other things
> of the tech-note and still nothing applies to this problem.
> Especially there are available scratch tapes and the number 
> of the available tapes
> in the tapepool is high enough.
> think I look through the 'fixed things' of 5.3 Client
> ... it seems to only happen at the 5.2 Clients .
> 
> Cheers
> Rainer
> 
> 
> PAC Brion Arnaud schrieb:
> 
> > Rainer,
> >
> > Just found this technote :
> > http://www-1.ibm.com/support/docview.wss?uid=swg21079391 
> which refers to
> > ANS1311E and ANR0522W  problems, and states this possible reason :
> >
> > - Cache is enabled on the disk storage pool that the TSM 
> Client backup
> > data is being sent to, and cached data cannot be deleted 
> quickly enough
> > to allow the backup data to be written to the disk storage 
> pool. In this
> > case, update the storage pool so that cache is not enabled.
> >
> > Couldn't this be hurting you ?
> >
> > Cheers
> >
> >
> > Arnaud
> >
> > 
> **************************************************************
> **********
> > ******
> > Panalpina Management Ltd., Basle, Switzerland,
> > CIT Department Viadukstrasse 42, P.O. Box 4002 Basel/CH
> > Phone:  +41 (61) 226 11 11, FAX: +41 (61) 226 17 01
> > Direct: +41 (61) 226 19 78
> > e-mail: arnaud.brion AT panalpina DOT com
> > 
> **************************************************************
> **********
> > ******
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] 
> On Behalf Of
> > Rainer Wolf
> > Sent: Wednesday, 30 August, 2006 09:31
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: client session stopps with 'no space available in 
> storage...
> > and all successor pools'
> >
> > Dear TSmers,
> >
> > this happens on tsm server 5.3.3.2 / solaris ,3494
> >
> > and Clients: linux86 5.2.3.1 , linux86 5.2.3.0 , solaris 5.2.5.0 ,
> > solaris 5.2.2.6 , winnt 5.2.3.11
> >
> > we have a strange problem with occasionly stopped client 
> sessions with
> > the message 'no space available in storage pool BACKUPPOOL and all
> > successor pools' .
> > If this happens it happens withe clients running bigger transfers in
> > time and data - mostly on initial backups.
> > The data flow is set up as
> > random access disk pool -->  sequential file pool --> 
> sequential tape
> > pool
> >
> > It may happen that the first 2 stages are going to be full but the
> > tapepool always has free and usable scratch volumes available.
> >
> > The question is : is this a bug at the server or do I have to change
> > something in the setups of the pools ?
> > The space in the random-access pools are normally migrating down to
> > about 50 % -- is it better to bring this down to 0% Usage as a daily
> > task ?
> >
> > I thought that sessions that don't have enough space in the
> > backup/filepools would directly write to tape if it is needed.
> > But if this stopping happens it seems to be just happening 
> on that long
> > and large running sessions starting to write on backuppool and then
> > switching to filepool ... it seems to be that there is no second
> > switching on the tapepool possible ?
> >
> > I just checked the client-versions of all nodes where this 
> happens and
> > all of them have 5.2.X.X ... so is it just a client-problem 
> with the old
> > 5.2.X.X clients ?
> >
> > Thanks a lot in advance for any hints !
> > Rainer
> >
> >
> > tsm: TSM1>q actlog begint=-20 search=94090
> >
> >
> > Date/Time                Message
> > --------------------
> > ----------------------------------------------------------
> > 08/29/06   08:13:01      ANR0406I Session 94090 started for node
> > ULLI187.CHEMIE
> >                            (Linux86) (Tcp/Ip
> > 134.60.42.187(1039)).(SESSION: 94090)
> > 08/29/06   20:17:08      ANR8340I FILE volume
> > /tsmdata3/tsm1/file8/00006B4D.BFS
> >                            mounted.(SESSION: 94090)
> > 08/29/06   20:17:08      ANR0511I Session 94090 opened output volume
> >                            
> /tsmdata3/tsm1/file8/00006B4D.BFS.(SESSION:
> > 94090)
> > 08/29/06   20:17:24      ANR8341I End-of-volume reached for 
> FILE volume
> >                            
> /tsmdata3/tsm1/file8/00006B4D.BFS.(SESSION:
> > 94090)
> > 08/29/06   20:17:24      ANR0514I Session 94090 closed volume
> >                            
> /tsmdata3/tsm1/file8/00006B4D.BFS.(SESSION:
> > 94090)
> > 08/29/06   20:17:24      ANR0522W Transaction failed for 
> session 94090
> > for node
> >                            ULLI187.CHEMIE (Linux86) - no 
> space available
> > in storage
> >                            pool BACKUPPOOL8 and all successor
> > pools.(SESSION: 94090)
> > 08/29/06   20:17:53      ANR0403I Session 94090 ended for node
> > ULLI187.CHEMIE
> >                            (Linux86).(SESSION: 94090)
> >
> >
> >
> > tsm: TSM1>q actlog search=94086 begind=-2
> >
> > Date/Time                Message
> > --------------------
> > ----------------------------------------------------------
> > 08/29/06   08:10:22      ANR0406I Session 94086 started for node
> > ULLI187.CHEMIE
> >                            (Linux86) (Tcp/Ip
> > 134.60.42.187(1038)).(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4952I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects inspected:
> > 1,458,833(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4954I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects backed up:
> > 1,457,166(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4958I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects updated:
> > 0(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4960I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects rebound:
> > 0(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4957I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects deleted:
> > 0(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4970I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects expired:
> > 0(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4959I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of objects failed:
> > 1(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4961I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Total
> >                            number of bytes transferred: 245.28
> > GB(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4963I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Data
> >                            transfer time:                112,550.97
> > sec(SESSION:
> >                            94086)
> > 08/29/06   20:17:54      ANE4966I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Network
> >                            data transfer rate:        2,285.17
> > KB/sec(SESSION:
> >                            94086)
> > 08/29/06   20:17:54      ANE4967I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Aggregate
> >                            data transfer rate:      5,913.70
> > KB/sec(SESSION: 94086)
> > 08/29/06   20:17:54      ANE4968I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Objects
> >                            compressed by:                   
>  0%(SESSION:
> > 94086)
> > 08/29/06   20:17:54      ANE4964I (Session: 94086, Node: 
> ULLI187.CHEMIE)
> > Elapsed
> >                            processing time:            
> 12:04:52(SESSION:
> > 94086)
> > 08/29/06   20:48:39      ANR0482W Session 94086 for node 
> ULLI187.CHEMIE
> > (Linux86)
> >                            terminated - idle for more than 30
> > minutes.(SESSION:
> >                            94086)
> >
> > The storage pools look like:
> >
> > Storage         Device          Estimated       Pct       
> Pct     High
> > Low     Next Stora-
> > Pool Name       Class Name       Capacity      Util      
> Migr      Mig
> > Mig     ge Pool
> >                                                             
>         Pct
> > Pct
> > -----------     ----------     ----------     -----     
> -----     ----
> > ---     -----------
> > BACKUPPOOL8     DISK                160 G      63.9      
> 56.2       90
> > 50     FILEPOOL8
> > FILEPOOL8       FILE8               405 G      76.2      
> 77.5       90
> > 70     TAPE_O262
> > TAPE_O262       3592             22,288 G      24.6      
> 60.0      100
> > 70
> >
> >
> >
> > tsm: TSM1>q stg BACKUPPOOL8 f=d
> >
> >                 Storage Pool Name: BACKUPPOOL8
> >                 Storage Pool Type: Primary
> >                 Device Class Name: DISK
> >                Estimated Capacity: 160 G
> >                Space Trigger Util: 63.9
> >                          Pct Util: 63.9
> >                          Pct Migr: 56.2
> >                       Pct Logical: 99.4
> >                      High Mig Pct: 90
> >                       Low Mig Pct: 50
> >                   Migration Delay: 0
> >                Migration Continue: Yes
> >               Migration Processes: 1
> >             Reclamation Processes:
> >                 Next Storage Pool: FILEPOOL8
> >              Reclaim Storage Pool:
> >            Maximum Size Threshold: No Limit
> >                            Access: Read/Write
> >                       Description:
> >                 Overflow Location:
> >             Cache Migrated Files?: No
> >                        Collocate?:
> >             Reclamation Threshold:
> >         Offsite Reclamation Limit:
> >   Maximum Scratch Volumes Allowed:
> >    Number of Scratch Volumes Used:
> >     Delay Period for Volume Reuse:
> >            Migration in Progress?: No
> >              Amount Migrated (MB): 176,075.38 Elapsed Migration Time
> > (seconds): 16,556
> >          Reclamation in Progress?:
> >    Last Update by (administrator): xx
> >             Last Update Date/Time: 08/29/06   17:00:14
> >          Storage Pool Data Format: Native
> >              Copy Storage Pool(s):
> >           Continue Copy on Error?:
> >                          CRC Data: No
> >                  Reclamation Type:
> >
> >
> > tsm: TSM1>
> >
> > tsm: TSM1>q stg filePOOL8 f=d
> >
> >                 Storage Pool Name: FILEPOOL8
> >                 Storage Pool Type: Primary
> >                 Device Class Name: FILE8
> >                Estimated Capacity: 405 G
> >                Space Trigger Util: 98.4
> >                          Pct Util: 76.2
> >                          Pct Migr: 77.5
> >                       Pct Logical: 99.9
> >                      High Mig Pct: 90
> >                       Low Mig Pct: 70
> >                   Migration Delay: 0
> >                Migration Continue: Yes
> >               Migration Processes: 1
> >             Reclamation Processes: 1
> >                 Next Storage Pool: TAPE_O262
> >              Reclaim Storage Pool:
> >            Maximum Size Threshold: No Limit
> >                            Access: Read/Write
> >                       Description:
> >                 Overflow Location:
> >             Cache Migrated Files?:
> >                        Collocate?: Group
> >             Reclamation Threshold: 100
> >         Offsite Reclamation Limit:
> >   Maximum Scratch Volumes Allowed: 200
> >    Number of Scratch Volumes Used: 155
> >     Delay Period for Volume Reuse: 1 Day(s)
> >            Migration in Progress?: No
> >              Amount Migrated (MB): 91,059.84 Elapsed Migration Time
> > (seconds): 2,587
> >          Reclamation in Progress?: No
> >    Last Update by (administrator): xx
> >             Last Update Date/Time: 08/30/06   08:08:01
> >          Storage Pool Data Format: Native
> >              Copy Storage Pool(s):
> >           Continue Copy on Error?:
> >                          CRC Data: No
> >                  Reclamation Type: Threshold
> >
> >
> > tsm: TSM1>q stg TAPE_O262 f=d
> >
> >                 Storage Pool Name: TAPE_O262
> >                 Storage Pool Type: Primary
> >                 Device Class Name: 3592
> >                Estimated Capacity: 22,288 G
> >                Space Trigger Util:
> >                          Pct Util: 24.6
> >                          Pct Migr: 60.0
> >                       Pct Logical: 97.1
> >                      High Mig Pct: 100
> >                       Low Mig Pct: 70
> >                   Migration Delay: 0
> >                Migration Continue: Yes
> >               Migration Processes: 1
> >             Reclamation Processes: 1
> >                 Next Storage Pool:
> >              Reclaim Storage Pool:
> >            Maximum Size Threshold: No Limit
> >                            Access: Read/Write
> >                       Description:
> >                 Overflow Location:
> >             Cache Migrated Files?:
> >                        Collocate?: Group
> >             Reclamation Threshold: 100
> >         Offsite Reclamation Limit:
> >   Maximum Scratch Volumes Allowed: 50
> >    Number of Scratch Volumes Used: 32
> >     Delay Period for Volume Reuse: 8 Day(s)
> >            Migration in Progress?: No
> >              Amount Migrated (MB): 0.00
> > Elapsed Migration Time (seconds): 0
> >          Reclamation in Progress?: No
> >    Last Update by (administrator): xx
> >             Last Update Date/Time: 08/29/06   16:33:34
> >          Storage Pool Data Format: Native
> >              Copy Storage Pool(s):
> >           Continue Copy on Error?:
> >                          CRC Data: No
> >                  Reclamation Type: Threshold
> >
> >
> >
> >
> > --
> > 
> --------------------------------------------------------------
> ----------
> > Rainer Wolf                          eMail:       
> rainer.wolf AT uni-ulm DOT de
> > kiz - Abt. Infrastruktur           Tel/Fax:      ++49 731 
> 50-22482/22471
> > Universitaet Ulm                     wwweb:        
> http://kiz.uni-ulm.de
> >
> >
> 
> --
> --------------------------------------------------------------
> ----------
> Rainer Wolf                          eMail:       
> rainer.wolf AT uni-ulm DOT de
> kiz - Abt. Infrastruktur           Tel/Fax:      ++49 731 
> 50-22482/22471
> Universitaet Ulm                     wwweb:        
> http://kiz.uni-ulm.de
>