ADSM-L

Re: [ADSM-L] TSM Client upgrade on AIX

2016-04-27 19:16:52
Subject: Re: [ADSM-L] TSM Client upgrade on AIX
From: Andrew Raibeck <storman AT US.IBM DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 27 Apr 2016 19:01:35 -0400
Hi Pam,

During incremental backup, the peak memory utilization usually occurs when
the client is processing the directory with the largest number of files. In
this case, unless your machine has only 8 MB of RAM ;-) I do not see how
~15K objects could cause memory to be exhausted. It doesn't pass the "sniff
test".

Regarding the 6.4 error where you see the return code 11: This most likely
corresponds to errno EAGAIN, which means there were insufficient system
resources to create a new thread. This is not an insufficient memory issue,
but some other system resource.

A shot in the dark, but... by any chance is the AIX system configured to
use 64 KB page sizes? I ask because of this AIX APAR which *might* be a
match:

http://www.ibm.com/support/docview.wss?uid=isg1IZ27457

(See the Comments section of the APAR to match the acutal 6.1 maintenance
level.)

Best regards,

Andy

____________________________________________________________________________

Andrew Raibeck | IBM Spectrum Protect Level 3 | storman AT us.ibm DOT com

IBM Tivoli Storage Manager links:
Product support:
https://www.ibm.com/support/entry/portal/product/tivoli/tivoli_storage_manager

Online documentation:
http://www.ibm.com/support/knowledgecenter/SSGSG7/landing/welcome_ssgsg7.html

Product Wiki:
https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2016-04-27
15:56:42:

> From: "Pagnotta, Pamela (CONTR)" <Pamela.Pagnotta AT HQ.DOE DOT GOV>
> To: ADSM-L AT VM.MARIST DOT EDU
> Date: 2016-04-27 15:59
> Subject: Re: TSM Client upgrade on AIX
> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>
> Andy,
>
> Here is the top few entries from the select statement. From what I
> can tell, none of the filesystems have more than 200K objects on this
client.
>
> FILESPACE_NAME: /gridfs
>        HL_NAME: /oraem/Oracle/admin/emrep/adump/
>  TOTAL_OBJECTS: 15551
>
> FILESPACE_NAME: /gridfs
>        HL_NAME: /oraem/Oracle/middleware/oms/sysman/archives/emgc/
> deployments/GCDomain/emgc.ear/em.war/cabo/jsLibs/resources/
>  TOTAL_OBJECTS: 2473
>
> FILESPACE_NAME: /gridfs
>        HL_NAME: /oraem/Oracle/middleware/logs/
>  TOTAL_OBJECTS: 2344
>
> Since moving back to version 6.4.2.0 there is a new message
>
> 04/27/16   02:33:22 ANS0361I DIAG: Thread creation failed; rc=11.
> 04/27/16   02:33:24 ANS1999E Incremental processing of '/usr' stopped.
>
> The only Technote I can find that is close to this message and rc is
> TSM server related on a Linux system.  Is there anywhere where these
> diagnostic return codes are defined for TSM administrators?
>
> I have moved this backup to a quieter time of the night to see if
> that helps at all.
>
> I will open a ticket for this new error tomorrow.
>
> Thank you,
>
> Pam Pagnotta
> Sr. System Engineer
> Criterion Systems, Inc./ActioNet
> Contractor to US. Department of Energy
> Office of the CIO/IM-622
> Office: 301-903-5508
> Mobile: 301-335-8177
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> Behalf Of Andrew Raibeck
> Sent: Wednesday, April 27, 2016 2:19 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] TSM Client upgrade on AIX
>
> Hi Pam,
>
> Do any of the file systems happen to have directories that contain large
> numbers of files (say, more than a million)? You could try running this
> SELECT statement from an administrative command line client to assess
this.
> Make sure to the node name RAIBECK with your node name (in all upper
case):
>
> select filespace_name, hl_name, count(*) as total_objects from backups
> where node_name='RAIBECK' and state='ACTIVE_VERSION' group by
> filespace_name, hl_name order by 3 desc
>
> You can cancel the output after the first few lines. What you are looking
> for is the top of the list, which would show you which file system and
> directory has the largest number of files. How many objects are there?
>
> If there are millions of files involved, it could be that memory is being
> exhausted (how much memory is available on this system?); though I would
> normally expect a proper "out of memory" message, rather than the more
> cryptic message you are seeing. In the past, I have heard customers say
> that this occurs after upgrading the client, but what really happened was
> that the number of files in the directory was growing continuously, and
> eventually the backup could not allocate enough memory; and the upgrade
> just happened to roughly coincide with the onset of the issue. I cannot
say
> whether this is possible in your situation, but I am just sharing some of
> my past experiences with this issue.
>
> From which client version and bit-architecture did you upgrade to 7.1? I
> see you put 6.4 on as a "workaround", but what was the original version?
> Earlier client versions did see an increase in memory usage when the
> clients were changed from 32-bit to 64-bit, as 64-bit software tends to
use
> more memory (pointer variables are 8 bytes rather than 4 bytes, and that
is
> one chief contributor). But no such change occurred from 6.4 to 7.1, so
why
> memory would be exhausted in 7.1 but not 6.4, I have no immediate idea.
>
> If the affected machine does not really have any directories with huge
> numbers of files, then this could be something else... I would invite you
> to reopen your PMR, let me know, and I will have it escalated to our
Level
> 2 support for further investigation. As I mentioned earlier, the cryptic
> calloc() error does not seem right.
>
> Best regards,
>
> Andy
>
>
____________________________________________________________________________

>
> Andrew Raibeck | IBM Spectrum Protect Level 3 | storman AT us.ibm DOT com
>
> IBM Tivoli Storage Manager links:
> Product support:
>
https://www.ibm.com/support/entry/portal/product/tivoli/tivoli_storage_manager

>
> Online documentation:
>
http://www.ibm.com/support/knowledgecenter/SSGSG7/landing/welcome_ssgsg7.html

>
> Product Wiki:
> https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%
> 20Storage%20Manager
>
> "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2016-04-27
> 13:15:50:
>
> > From: "Pagnotta, Pamela (CONTR)" <Pamela.Pagnotta AT HQ.DOE DOT GOV>
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Date: 2016-04-27 13:17
> > Subject: Re: Re: TSM Client upgrade on AIX
> > Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
> >
> > Hi Dave,
> >
> > Thank you for the information. I, of course, could not get the
> > person assigned to my ticket to even acknowledge that this might be
> > due to some different memory requirements for the newer TSM clients.
> > The only response I received was that we just must not have enough
> > memory on our system to do the backup despite being told that there
> > was no issue with an older client.
> >
> > Pam
> >
> > Pam Pagnotta
> > Sr. System Engineer
> > Criterion Systems, Inc./ActioNet
> > Contractor to US. Department of Energy
> > Office of the CIO/IM-622
> > Office: 301-903-5508
> > Mobile: 301-335-8177
> >
> >
> > -----Original Message-----
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> > Behalf Of David Bronder
> > Sent: Wednesday, April 27, 2016 12:56 PM
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: Re: [ADSM-L] Re: TSM Client upgrade on AIX
> >
> > This isn't really helpful for your specific situation, Pam (I don't
think
> > I've had the specific errors you've seen).  But I have noticed (with
much
> > dismay) that the 7.x clients for AIX have required significantly more
> memory
> > than earlier versions.  I have clients with 1+ million files in a
> filesystem
> > that had no problems with 6.x and earlier clients, but consistently
> required
> > huge data ulimits after upgrading to 7.x (and would fail, often
> completely
> > silently, if the ulimit wasn't high enough).
> >
> > I don't know what IBM did with the 7.x clients to make them so
> memory-greedy
> > compared to earlier versions.  Maybe the client-side dedupe support or
> > something, though I'm not using those newer features currently, so I
> would
> > hope that wouldn't be a factor.  Then again, I would hope IBM would
> realize
> > that setting the data ulimit to unlimited isn't really a best practice
> and
> > that having successful backups shouldn't require risking breaking
> services on
> > the systems those backups are protecting.  (</soapbox>)
> >
> > So far, I've gotten by with a non-unlimited ulimit, but it seems like I
> do
> > have to keep raising it with each new 7.x client release...
> >
> > =Dave
> >
> >
> > On 04/27/2016 09:09 AM, Pagnotta, Pamela (CONTR) wrote:
> > > Hi Matthew,
> > >
> > > Yes, the root user ulimits is set to unlimited on all the AIX
servers.
> > >
> > > Regards,
> > >
> > > Pam Pagnotta
> > > Sr. System Engineer
> > > Criterion Systems, Inc./ActioNet
> > > Contractor to US. Department of Energy
> > > Office of the CIO/IM-622
> > > Office: 301-903-5508
> > > Mobile: 301-335-8177
> > >
> > > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> > Behalf Of Matthew McGeary
> > > Sent: Wednesday, April 27, 2016 9:56 AM
> > > To: ADSM-L AT VM.MARIST DOT EDU
> > > Subject: Re: [ADSM-L] TSM Client upgrade on AIX
> > >
> > > Good morning Pam,
> > >
> > > We encountered errors backing up filesystems with large numbers of
> > files until we set the root user ulimits to unlimited.  That fixed
> > the problem but can have other consequences, obviously.  Do you know
> > if your AIX admin tried changing the ulimits?
> > >
> > > Regards,
> > > __________________________
> > >
> > > Matthew McGeary
> > > Senior Technical Specialist - Infrastructure
> > > PotashCorp
> > > T: (306) 933-8921
> > > www.potashcorp.com
> > >
> > > From:        "Pagnotta, Pamela (CONTR)" <Pamela.Pagnotta AT HQ.DOE DOT 
> > > GOV>
> > > To:        ADSM-L AT VM.MARIST DOT EDU
> > > Date:        04/27/2016 07:47 AM
> > > Subject:        [ADSM-L] TSM Client upgrade on AIX
> > > Sent by:        "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
> > >
> > > ________________________________
> > >
> > >
> > >
> > > Hello,
> > >
> > > Recently one of our AIX administrators upgraded the TSM client to
> > 7.1.4.4 on her servers. Many of them started receiving errors like
> > >
> > > calloc() failed: Size 31496 File ../mem/mempool.cpp Line 1092
> > >
> > > I looked this up and the indication is that the AIX server could
> > not supply enough memory to TSM to complete the backup. We opened a
> > ticket and were told to try memoryefficientbackup with
> > diskcachemethod. This did not fix the issue.
> > >
> > > In frustration the administrator reinstalled a TSM client version
> > of 6.4.2.0 and is no longer experiencing the memory problems.
> > >
> > > Any thoughts?
> > >
> > > Thank you,
> > >
> > > Pam
> > >
> > > Pam Pagnotta
> > > Sr. System Engineer
> > > Criterion Systems, Inc./ActioNet
> > > Contractor to US. Department of Energy
> > > Office of the CIO/IM-622
> > > Office: 301-903-5508
> > > Mobile: 301-335-8177
> > >
> >
> > --
> > Hello World.                                David Bronder - Systems
> Architect
> > Segmentation Fault                                      ITS-EI, Univ.
of
> Iowa
> > Core dumped, disk trashed, quota filled, soda warm.
> david-bronder AT uiowa DOT edu
> >
>

<Prev in Thread] Current Thread [Next in Thread>