ADSM-L

Re: Server crash (log full?)

2006-04-27 12:46:40
Subject: Re: Server crash (log full?)
From: Roger Deschner <rogerd AT UIC DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 27 Apr 2006 11:41:34 -0500
The log can be pinned by a slow client who is backing up a large file.

1. Be sure to set a maximum file size in the primary disk storage pool,
and also in all successor tape storage pools. The capacity of a
double-layer DVD is 8.5GB, so I have set my limits here to 10GB, which
still allows people doing video to edit and create the largest video DVD
available now. I suppose when blue-ray comes out I'll have to increase
this accordingly.

2. You may want to look at setting throughput limits. Time marches on;
slow clients cannot be tolerated. Or if you have to back up some slow
clients, put them in a separate management class and separate storage
pool heirarchy with much smaller maximum file size limits so they can't
pin the log.

Those two measures will usually control it. I also did something more:

3. I have a daemon that checks periodically (every 15 minutes) to see if
the log is getting full. If it's more than 69% full, it does SHOW LOGPIN
to see who has the log pinned, and then does a CANCEL SESSION on it. I
find that if I cancel whoever has the log pinned by the time it is 69%
full, then it never fills up. It can take as long as an hour for CANCEL
SESSION to work on a client session that's got the log pinned, but it
will eventually work. You may need to adjust the 69% number up or down
depending on your local experience.

The usual culprit is somebody editing huge video files on a Mac (Macs
are slow!).

Roger Deschner      University of Illinois at Chicago     rogerd AT uic DOT edu


On Wed, 26 Apr 2006, Vats.Ashok wrote:

>We have following  recovery log and we see logpin issue from various windows 
>clients. IS this a product defect per its design ? what are measure you take 
>to avoid 100% full log situation. At time we can't even cancel the sessions 
>and have to halt the server if below 100% full other wise extend the log and 
>then do the recovery of the server ? Any ideas ....
>q log
>
>Available   Assigned     Maximum     Maximum      Page       Total        Used 
>    Pct    Max.
>    Space   Capacity   Extension   Reduction      Size      Usable       Pages 
>    Util     Pct
>     (MB)       (MB)        (MB)        (MB)   (bytes)       Pages             
>            Util
>---------   --------   ---------   ---------   -------   ---------   --------- 
>  -----   -----
>   10,240     10,140         100       9,476     4,096   2,595,328     169,205 
>     6.5    83.2
>
>
>
>-----Original Message-----
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU]On Behalf Of
>Loon, E.J. van - SPLXM
>Sent: Tuesday, April 25, 2006 7:34 AM
>To: ADSM-L AT VM.MARIST DOT EDU
>Subject: Re: [ADSM-L] Server crash (log full?)
>
>
>Hi Ray!
>I think we all have seen this before.
>What's the size of your recovery log?
>Kindest regards,
>Eric van Loon
>KLM Royal Dutch Airlines
>
>
>-----Original Message-----
>From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
>-ray
>Sent: dinsdag 25 april 2006 16:19
>To: ADSM-L AT VM.MARIST DOT EDU
>Subject: Server crash (log full?)
>
>All,
>
>I've seen a condition a few times where the TSM log will fill and the
>server will crash.  See the console log below.  Has anyone seen this
>before?  Any ideas?  TSM server 5.3.2.0 on AIX 5.3.
>
>Thanks,
>ray
>
>
>
>ANR2997W The server log is 81 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR0482W Session 3476 for node HD3 (Linux86) terminated - idle for more
>than 15 minutes.
>ANR2996I The server log is 74 percent full. The server is no longer
>delaying transactions.
>ANR0406I Session 3493 started for node HD3 (Linux86) (Tcp/Ip
>hd3.csd.selu.edu( 47219)).
>ANR0403I Session 3489 ended for node NORM (Linux86).
>ANR2507I Schedule LINUX-WEEKDAY for domain LINUX started at 04/24/06
>22:00:00 for node NORM completed successfully at 04/24/06 23:04:51.
>ANR0403I Session 3488 ended for node NORM (Linux86).
>ANR0406I Session 3494 started for node NORM (Linux86) (Tcp/Ip
>norm.selu.edu(53
>827)).
>ANR0403I Session 3494 ended for node NORM (Linux86).
>ANR2997W The server log is 80 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR0482W Session 3490 for node GANDALF (AIX) terminated - idle for more
>than 15 minutes.
>ANR2997W The server log is 81 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 83 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 85 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 86 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR0406I Session 3495 started for node GIMLI (Linux86) (Tcp/Ip
>gimli.csd.selu.
>edu(56781)).
>ANR2997W The server log is 88 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 88 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 90 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR0403I Session 3439 ended for node GIMLI (Linux86).
>ANR2507I Schedule LINUX-WEEKDAY for domain LINUX started at 04/24/06
>22:00:00 for node GIMLI completed successfully at 04/24/06 23:15:20.
>ANR0406I Session 3496 started for node GIMLI (Linux86) (Tcp/Ip
>gimli.csd.selu.
>edu(56785)).
>ANR0403I Session 3495 ended for node GIMLI (Linux86).
>ANR0403I Session 3496 ended for node GIMLI (Linux86).
>ANR2997W The server log is 90 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 91 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 91 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 92 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 92 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR0314W Recovery log usage exceeds 92 % of its assigned capacity.
>ANR2997W The server log is 93 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 91 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 89 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 89 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR0482W Session 3493 for node HD3 (Linux86) terminated - idle for more
>than 15 minutes.
>ANR2997W The server log is 90 percent full. The server will delay
>transactions by 30 milliseconds.
>ANR2997W The server log is 89 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 87 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR0482W Session 3486 for node VULCAN (Linux86) terminated - idle for
>more than
>15 minutes.
>ANR2997W The server log is 89 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 87 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 86 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 87 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 88 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 87 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 88 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 86 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR2997W The server log is 85 percent full. The server will delay
>transactions by 3 milliseconds.
>ANR9999D logseg.c(579): ThreadId<93> Attempt made to locate segment for
>LSN 2684100.0.0 below truncation base LSN 2684102.0.0.
>ANR9999D ThreadId<93> issued message 9999 from:  <-0x000000010001c168
>outDiagf
><-0x000000010005bfd8 LogFindSegment <-0x000000010005d340 ReadPage
><-0x00000001
>0005f51c logRead <-0x00000001006ce318 logShowLsn <-0x00000001006ca424
>dbShowLogPinned <-0x000000010005c468 logCheckDelay <-0x00000001002a1e28
>dbLogReserve <-0x00000001006b4304 TbMergeTree <-0x00000001000b3e64
>TbDelete <-0x00000001000b7eec tbTableOp <-0x000000010038ec2c bfDestroy
><-0x000000010016
>d224 ImDeleteBitfile <-0x0000000100175870 imDeleteObject
><-0x000000010067a12c DeleteFilesThread <-0x000000010000e9dc StartThread
><-0x09000000003192dc _pthread_body ANR7838S Server operation terminated.
>ANR7837S Internal error LOGSEG415 detected.
>   0x00000001000104cc pkAbort
>   0x000000010005bfe4 LogFindSegment
>   0x000000010005d340 ReadPage
>   0x000000010005f51c logRead
>   0x00000001006ce318 logShowLsn
>   0x00000001006ca424 dbShowLogPinned
>   0x000000010005c468 logCheckDelay
>   0x00000001002a1e28 dbLogReserve
>   0x00000001006b4304 TbMergeTree
>   0x00000001000b3e64 TbDelete
>   0x00000001000b7eec tbTableOp
>   0x000000010038ec2c bfDestroy
>   0x000000010016d224 ImDeleteBitfile
>   0x0000000100175870 imDeleteObject
>   0x000000010067a12c DeleteFilesThread
>   0x000000010000e9dc StartThread
>   0x09000000003192dc _pthread_body
>ANR7833S Server thread 1 terminated in response to program abort.
>ANR7833S Server thread 2 terminated in response to program abort.
>ANR7833S Server thread 3 terminated in response to program abort.
>ANR7833S Server thread 4 terminated in response to program abort.
>ANR7833S Server thread 5 terminated in response to program abort.
>ANR7833S Server thread 6 terminated in response to program abort.
>ANR7833S Server thread 7 terminated in response to program abort.
>ANR7833S Server thread 8 terminated in response to program abort.
>ANR7833S Server thread 9 terminated in response to program abort.
>ANR7833S Server thread 10 terminated in response to program abort.
>ANR7833S Server thread 11 terminated in response to program abort.
>ANR7833S Server thread 12 terminated in response to program abort.
>ANR7833S Server thread 13 terminated in response to program abort.
>ANR7833S Server thread 14 terminated in response to program abort.
>ANR7833S Server thread 15 terminated in response to program abort.
>ANR7833S Server thread 16 terminated in response to program abort.
>ANR7833S Server thread 17 terminated in response to program abort.
>ANR7833S Server thread 18 terminated in response to program abort.
>ANR7833S Server thread 19 terminated in response to program abort.
>ANR7833S Server thread 20 terminated in response to program abort.
>ANR7833S Server thread 21 terminated in response to program abort.
>ANR7833S Server thread 22 terminated in response to program abort.
>ANR7833S Server thread 23 terminated in response to program abort.
>ANR7833S Server thread 24 terminated in response to program abort.
>ANR7833S Server thread 25 terminated in response to program abort.
>ANR7833S Server thread 26 terminated in response to program abort.
>ANR7833S Server thread 27 terminated in response to program abort.
>ANR7833S Server thread 28 terminated in response to program abort.
>ANR7833S Server thread 29 terminated in response to program abort.
>ANR7833S Server thread 30 terminated in response to program abort.
>ANR7833S Server thread 31 terminated in response to program abort.
>ANR7833S Server thread 32 terminated in response to program abort.
>ANR7833S Server thread 33 terminated in response to program abort.
>ANR7833S Server thread 34 terminated in response to program abort.
>ANR7833S Server thread 35 terminated in response to program abort.
>ANR7833S Server thread 36 terminated in response to program abort.
>ANR7833S Server thread 37 terminated in response to program abort.
>ANR7833S Server thread 38 terminated in response to program abort.
>ANR7833S Server thread 39 terminated in response to program abort.
>ANR7833S Server thread 40 terminated in response to program abort.
>ANR7833S Server thread 41 terminated in response to program abort.
>ANR7833S Server thread 42 terminated in response to program abort.
>ANR7833S Server thread 43 terminated in response to program abort.
>ANR7833S Server thread 44 terminated in response to program abort.
>ANR7833S Server thread 45 terminated in response to program abort.
>ANR7833S Server thread 46 terminated in response to program abort.
>ANR7833S Server thread 47 terminated in response to program abort.
>ANR7833S Server thread 48 terminated in response to program abort.
>ANR7833S Server thread 49 terminated in response to program abort.
>ANR7833S Server thread 50 terminated in response to program abort.
>ANR7833S Server thread 51 terminated in response to program abort.
>ANR7833S Server thread 52 terminated in response to program abort.
>ANR7833S Server thread 53 terminated in response to program abort.
>ANR7833S Server thread 54 terminated in response to program abort.
>ANR7833S Server thread 55 terminated in response to program abort.
>ANR7833S Server thread 56 terminated in response to program abort.
>ANR7833S Server thread 57 terminated in response to program abort.
>ANR7833S Server thread 58 terminated in response to program abort.
>ANR7833S Server thread 59 terminated in response to program abort.
>ANR7833S Server thread 60 terminated in response to program abort.
>ANR7833S Server thread 61 terminated in response to program abort.
>ANR7833S Server thread 62 terminated in response to program abort.
>ANR7833S Server thread 63 terminated in response to program abort.
>ANR7833S Server thread 64 terminated in response to program abort.
>ANR7833S Server thread 65 terminated in response to program abort.
>ANR7833S Server thread 66 terminated in response to program abort.
>ANR7833S Server thread 67 terminated in response to program abort.
>ANR7833S Server thread 68 terminated in response to program abort.
>ANR7833S Server thread 69 terminated in response to program abort.
>ANR7833S Server thread 70 terminated in response to program abort.
>ANR7833S Server thread 71 terminated in response to program abort.
>ANR7833S Server thread 72 terminated in response to program abort.
>ANR7833S Server thread 73 terminated in response to program abort.
>ANR7833S Server thread 74 terminated in response to program abort.
>ANR7833S Server thread 75 terminated in response to program abort.
>ANR7833S Server thread 76 terminated in response to program abort.
>ANR7833S Server thread 77 terminated in response to program abort.
>ANR7833S Server thread 79 terminated in response to program abort.
>ANR7833S Server thread 82 terminated in response to program abort.
>ANR7833S Server thread 83 terminated in response to program abort.
>ANR7833S Server thread 84 terminated in response to program abort.
>ANR7833S Server thread 85 terminated in response to program abort.
>ANR7833S Server thread 86 terminated in response to program abort.
>ANR7833S Server thread 90 terminated in response to program abort.
>ANR7833S Server thread 91 terminated in response to program abort.
>ANR7833S Server thread 92 terminated in response to program abort.
>ANR7833S Server thread 93 terminated in response to program abort.
>ANR7833S Server thread 95 terminated in response to program abort.
>ANR7833S Server thread 97 terminated in response to program abort.
>ANR7833S Server thread 99 terminated in response to program abort.
>ANR7833S Server thread 101 terminated in response to program abort.
>ANR7833S Server thread 104 terminated in response to program abort.
>ANR7833S Server thread 105 terminated in response to program abort.
>ANR7833S Server thread 106 terminated in response to program abort.
>ANR7833S Server thread 111 terminated in response to program abort.
>ANR7833S Server thread 116 terminated in response to program abort.
>/usr/tivoli/tsm/server/bin/rc.adsmserv[35]: 541352 IOT/Abort
>trap(coredump)
>
>--
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>Ray DeJean                                      http://www.r-a-y.org
>Systems Engineer                    Southeastern Louisiana University
>IBM Certified Specialist             AIX Administration, AIX Support
>=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
>**********************************************************************
>For information, services and offers, please visit our web site: 
>http://www.klm.com. This e-mail and any attachment may contain confidential 
>and privileged material intended for the addressee only. If you are not the 
>addressee, you are notified that no part of the e-mail or any attachment may 
>be disclosed, copied or distributed, and that any other action related to this 
>e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
>received this e-mail by error, please notify the sender immediately by return 
>e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), 
>its subsidiaries and/or its employees shall not be liable for the incorrect or 
>incomplete transmission of this e-mail or any attachments, nor responsible for 
>any delay in receipt.
>**********************************************************************
>

<Prev in Thread] Current Thread [Next in Thread>