ADSM-L

Re: [ADSM-L] TSM Client upgrade on AIX

2016-04-27 15:58:41
Subject: Re: [ADSM-L] TSM Client upgrade on AIX
From: "Pagnotta, Pamela (CONTR)" <Pamela.Pagnotta AT HQ.DOE DOT GOV>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 27 Apr 2016 19:56:42 +0000
Andy,

Here is the top few entries from the select statement. From what I can tell, 
none of the filesystems have more than 200K objects on this client. 

FILESPACE_NAME: /gridfs
       HL_NAME: /oraem/Oracle/admin/emrep/adump/
 TOTAL_OBJECTS: 15551

FILESPACE_NAME: /gridfs
       HL_NAME: 
/oraem/Oracle/middleware/oms/sysman/archives/emgc/deployments/GCDomain/emgc.ear/em.war/cabo/jsLibs/resources/
 TOTAL_OBJECTS: 2473

FILESPACE_NAME: /gridfs
       HL_NAME: /oraem/Oracle/middleware/logs/
 TOTAL_OBJECTS: 2344

Since moving back to version 6.4.2.0 there is a new message

04/27/16   02:33:22 ANS0361I DIAG: Thread creation failed; rc=11.
04/27/16   02:33:24 ANS1999E Incremental processing of '/usr' stopped.

The only Technote I can find that is close to this message and rc is TSM server 
related on a Linux system.  Is there anywhere where these diagnostic return 
codes are defined for TSM administrators?

I have moved this backup to a quieter time of the night to see if that helps at 
all.

I will open a ticket for this new error tomorrow. 

Thank you,

Pam Pagnotta
Sr. System Engineer
Criterion Systems, Inc./ActioNet
Contractor to US. Department of Energy
Office of the CIO/IM-622
Office: 301-903-5508
Mobile: 301-335-8177


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Andrew Raibeck
Sent: Wednesday, April 27, 2016 2:19 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] TSM Client upgrade on AIX

Hi Pam,

Do any of the file systems happen to have directories that contain large
numbers of files (say, more than a million)? You could try running this
SELECT statement from an administrative command line client to assess this.
Make sure to the node name RAIBECK with your node name (in all upper case):

select filespace_name, hl_name, count(*) as total_objects from backups
where node_name='RAIBECK' and state='ACTIVE_VERSION' group by
filespace_name, hl_name order by 3 desc

You can cancel the output after the first few lines. What you are looking
for is the top of the list, which would show you which file system and
directory has the largest number of files. How many objects are there?

If there are millions of files involved, it could be that memory is being
exhausted (how much memory is available on this system?); though I would
normally expect a proper "out of memory" message, rather than the more
cryptic message you are seeing. In the past, I have heard customers say
that this occurs after upgrading the client, but what really happened was
that the number of files in the directory was growing continuously, and
eventually the backup could not allocate enough memory; and the upgrade
just happened to roughly coincide with the onset of the issue. I cannot say
whether this is possible in your situation, but I am just sharing some of
my past experiences with this issue.

>From which client version and bit-architecture did you upgrade to 7.1? I
see you put 6.4 on as a "workaround", but what was the original version?
Earlier client versions did see an increase in memory usage when the
clients were changed from 32-bit to 64-bit, as 64-bit software tends to use
more memory (pointer variables are 8 bytes rather than 4 bytes, and that is
one chief contributor). But no such change occurred from 6.4 to 7.1, so why
memory would be exhausted in 7.1 but not 6.4, I have no immediate idea.

If the affected machine does not really have any directories with huge
numbers of files, then this could be something else... I would invite you
to reopen your PMR, let me know, and I will have it escalated to our Level
2 support for further investigation. As I mentioned earlier, the cryptic
calloc() error does not seem right.

Best regards,

Andy

____________________________________________________________________________

Andrew Raibeck | IBM Spectrum Protect Level 3 | storman AT us.ibm DOT com

IBM Tivoli Storage Manager links:
Product support:
https://www.ibm.com/support/entry/portal/product/tivoli/tivoli_storage_manager

Online documentation:
http://www.ibm.com/support/knowledgecenter/SSGSG7/landing/welcome_ssgsg7.html

Product Wiki:
https://www.ibm.com/developerworks/community/wikis/home/wiki/Tivoli%20Storage%20Manager

"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 2016-04-27
13:15:50:

> From: "Pagnotta, Pamela (CONTR)" <Pamela.Pagnotta AT HQ.DOE DOT GOV>
> To: ADSM-L AT VM.MARIST DOT EDU
> Date: 2016-04-27 13:17
> Subject: Re: Re: TSM Client upgrade on AIX
> Sent by: "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
>
> Hi Dave,
>
> Thank you for the information. I, of course, could not get the
> person assigned to my ticket to even acknowledge that this might be
> due to some different memory requirements for the newer TSM clients.
> The only response I received was that we just must not have enough
> memory on our system to do the backup despite being told that there
> was no issue with an older client.
>
> Pam
>
> Pam Pagnotta
> Sr. System Engineer
> Criterion Systems, Inc./ActioNet
> Contractor to US. Department of Energy
> Office of the CIO/IM-622
> Office: 301-903-5508
> Mobile: 301-335-8177
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> Behalf Of David Bronder
> Sent: Wednesday, April 27, 2016 12:56 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: Re: [ADSM-L] Re: TSM Client upgrade on AIX
>
> This isn't really helpful for your specific situation, Pam (I don't think
> I've had the specific errors you've seen).  But I have noticed (with much
> dismay) that the 7.x clients for AIX have required significantly more
memory
> than earlier versions.  I have clients with 1+ million files in a
filesystem
> that had no problems with 6.x and earlier clients, but consistently
required
> huge data ulimits after upgrading to 7.x (and would fail, often
completely
> silently, if the ulimit wasn't high enough).
>
> I don't know what IBM did with the 7.x clients to make them so
memory-greedy
> compared to earlier versions.  Maybe the client-side dedupe support or
> something, though I'm not using those newer features currently, so I
would
> hope that wouldn't be a factor.  Then again, I would hope IBM would
realize
> that setting the data ulimit to unlimited isn't really a best practice
and
> that having successful backups shouldn't require risking breaking
services on
> the systems those backups are protecting.  (</soapbox>)
>
> So far, I've gotten by with a non-unlimited ulimit, but it seems like I
do
> have to keep raising it with each new 7.x client release...
>
> =Dave
>
>
> On 04/27/2016 09:09 AM, Pagnotta, Pamela (CONTR) wrote:
> > Hi Matthew,
> >
> > Yes, the root user ulimits is set to unlimited on all the AIX servers.
> >
> > Regards,
> >
> > Pam Pagnotta
> > Sr. System Engineer
> > Criterion Systems, Inc./ActioNet
> > Contractor to US. Department of Energy
> > Office of the CIO/IM-622
> > Office: 301-903-5508
> > Mobile: 301-335-8177
> >
> > From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On
> Behalf Of Matthew McGeary
> > Sent: Wednesday, April 27, 2016 9:56 AM
> > To: ADSM-L AT VM.MARIST DOT EDU
> > Subject: Re: [ADSM-L] TSM Client upgrade on AIX
> >
> > Good morning Pam,
> >
> > We encountered errors backing up filesystems with large numbers of
> files until we set the root user ulimits to unlimited.  That fixed
> the problem but can have other consequences, obviously.  Do you know
> if your AIX admin tried changing the ulimits?
> >
> > Regards,
> > __________________________
> >
> > Matthew McGeary
> > Senior Technical Specialist - Infrastructure
> > PotashCorp
> > T: (306) 933-8921
> > www.potashcorp.com
> >
> > From:        "Pagnotta, Pamela (CONTR)" <Pamela.Pagnotta AT HQ.DOE DOT GOV>
> > To:        ADSM-L AT VM.MARIST DOT EDU
> > Date:        04/27/2016 07:47 AM
> > Subject:        [ADSM-L] TSM Client upgrade on AIX
> > Sent by:        "ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>
> >
> > ________________________________
> >
> >
> >
> > Hello,
> >
> > Recently one of our AIX administrators upgraded the TSM client to
> 7.1.4.4 on her servers. Many of them started receiving errors like
> >
> > calloc() failed: Size 31496 File ../mem/mempool.cpp Line 1092
> >
> > I looked this up and the indication is that the AIX server could
> not supply enough memory to TSM to complete the backup. We opened a
> ticket and were told to try memoryefficientbackup with
> diskcachemethod. This did not fix the issue.
> >
> > In frustration the administrator reinstalled a TSM client version
> of 6.4.2.0 and is no longer experiencing the memory problems.
> >
> > Any thoughts?
> >
> > Thank you,
> >
> > Pam
> >
> > Pam Pagnotta
> > Sr. System Engineer
> > Criterion Systems, Inc./ActioNet
> > Contractor to US. Department of Energy
> > Office of the CIO/IM-622
> > Office: 301-903-5508
> > Mobile: 301-335-8177
> >
>
> --
> Hello World.                                David Bronder - Systems
Architect
> Segmentation Fault                                      ITS-EI, Univ. of
Iowa
> Core dumped, disk trashed, quota filled, soda warm.
david-bronder AT uiowa DOT edu
>

<Prev in Thread] Current Thread [Next in Thread>