Page 1 of 2 12 LastLast
Results 1 to 24 of 25
  1. #1
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Question Recovery log getting pinned nightly, TSM 5.5.6

    Pretty much at a loss here. I think I've tried just about everything but obviously haven't hit the nail on the head yet.

    TSM 5.5.6 on AIX > upgrade to 6.x is planned for this year
    DB size is ~100GB running in normal mode > rollforward was causing DB backups to run constantly

    - show logpinned does state that I have dirty buffer pool pages
    - trying to kick off occasional incr db backups has no effect as they can't even get started the system is so locked up
    - running the flush command does seem to do it's job, but obviously not well enough to avoid the system from crashing
    - consultant says it's possible that I don't have enough network bandwidth available on the server itself(two bonded 1Gb links) > another dual-port card is on it's way to make it 4
    - I've split up the backup jobs so that not all of my servers are backed up every night...has had no lasting effect

    Any ideas would be greatly appreciated.

  2. #2
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    How many nodes are you backing up on this TSM server?

    One main cause of the logs getting full is a node or nodes has so many files to back up and the primary pool (I hope you have disk pools) is full (cache enabled taking time to migrate) or not big enough to hold the backup.

    If you have disk pools, and you are sure it is really big to hold the backup data, check for performance issues. What does AIX 'errpt' show? If your disk pool is on SAN, checkout the LUNs.

    If you have a lot of nodes to backup you may want to split your TSM server.
    Last edited by moon-buddy; 06-30-2012 at 10:50 PM.
    Ed

  3. #3
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    500+ nodes

    Primary pool for the great majority of these systems is disk and it does have room for a night's worth of incrementals at least when split up across different nights.

    Disk pool is on a SAN, errpt does mention occasionally "PATH HAS FAILED" for some of the LUNs.

  4. #4
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    500+ nodes

    Primary pool for the great majority of these systems is disk and it does have room for a night's worth of incrementals at least when split up across different nights.
    500 is what I still consider a low medium environment. I had an environment with 480 nodes and I can complete a nights' backup within the 10 hour window. I only had 7 3592 tape drives, 8 TB of disk pool and 6 HBA split between the SAN and tape drives - 2 for SAN and 4 for tape drives. The HBA setup is far from ideal as it is required to have 1 HBA per two drives. The environment is a p520 (4 CPU) with 4 GB of RAM, 2 fiber attached TCP/IP ports at 1 GB and an internal disk of 180 GB for the TSM DB and OS. AIX is 6.1 and TSM version is 5.5.6

    Disk pool is on a SAN, errpt does mention occasionally "PATH HAS FAILED" for some of the LUNs.
    This seems to be your problem. Troubleshoot the path failed errors. I have encountered this and had the log pinned for two days.

    6 HBA split between the SAN and tape drives - 2 for SAN and 4 for tape drives.
    Correction: I had only 4 HBAs: 2 were for SAN and 2 for the tape drives.
    Last edited by moon-buddy; 07-02-2012 at 01:23 PM.
    Ed

  5. #5
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    I'll check out the path failed errors, thanks.

  6. #6
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Do you recall what the resolution was for your path failed errors?

  7. #7
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    That was a SAN fabric issue.

    See if you might have a disk issue on the SAN.
    Ed

  8. #8
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Ended up using the third post from this link...

    http://groups.google.com/group/comp....700914d47e34a5

    Hopefully this helps us.

  9. #9
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Fixing the failed paths has not fixed my problem, unfortunately.

  10. #10
    Senior Member
    Join Date
    Apr 2005
    Location
    Michigan
    Posts
    1,359
    Thanks
    0
    Thanked 1 Time in 1 Post

    Default

    Here is an idea as we have had a bit of the same problem with our TSM DB running on local disk and we then moved it to the SAN and disabled mirroring.
    Since your DB is on the SAN already..... have your SAN Administrator create new LUNs on a different channel. From here - one DBvolume at a time - mirror that volume to the new LUN.
    Once sync'd - disable the primary volume and force the movement to the new LUN - Run a DB backup. Repeat until you find your LUN(disk) with the dirty pages.
    Reformat that LUN and bring that back into the picture as your DBcopy. Or leave mirroring off as the SAN will take over from there in regards to disk managment - failed disk hot spares all that jazz.

    Next - on your HBAs - if they are dual ported - or quads - run one HBA for disk - One HBA for tape - leaving multi-pathing on the HBA/port switch. This also helps in troubleshooting failing switch ports. Quads are usually best as two ports are readily available at all times for hot fixes/swaps.

    Hope this helps
    Steven Gabriel
    Principal -SGSolutions Inc.
    http://www.sgsolutionsinc.com

  11. #11
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Actually the DB is mirrored on local disk. Primary pool for the majority of the systems is SAN-based.

  12. #12
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    Fixing the failed paths has not fixed my problem, unfortunately.
    Have you rebooted the TSM server? Have you checked the local disk that houses the TSM DB and LOG volumes? Have you determined which node is sending large data and has isolated that node?
    Ed

  13. #13
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Server has not been rebooted recently. Local disk appears to be fine, however my AIX knowledge is a bit limited. I do know the nodes that send large data(mostly Exchange and SQL flat files as we don't use TDP agents).

  14. #14
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    Server has not been rebooted recently. Local disk appears to be fine, however my AIX knowledge is a bit limited. I do know the nodes that send large data(mostly Exchange and SQL flat files as we don't use TDP agents).
    When these nodes send large data, what happens to the disk pool that receives the data? Do these get full and migration starts over to the next (presumably tape) pool?

    Can you have an AIX admin check your system?
    Ed

  15. #15
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    The largest of the bunch goes straight to tape(exchange), the others go to a diskpool large enough to handle a night's worth of backups...a scheduled migration to tape is then run later on outside of the backup window.

    We don't have an AIX admin on staff, but I'll see what I can do. Our consultant's firm has AIX admins.

    One other thing to note, I currently have 14 DB volumes...would it help to attempt to reduce that number and/or move those DB volumes off of local disk to the SAN, if possible?

  16. #16
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    The largest of the bunch goes straight to tape(exchange), the others go to a diskpool large enough to handle a night's worth of backups...a scheduled migration to tape is then run later on outside of the backup window.
    This is where the log pin may be happening - it takes a long time to backup Exchaage directly to disk. Can you not move this to disk pool first?

    We don't have an AIX admin on staff, but I'll see what I can do. Our consultant's firm has AIX admins.

    One other thing to note, I currently have 14 DB volumes...would it help to attempt to reduce that number and/or move those DB volumes off of local disk to the SAN, if possible?
    How big are the volumes? Generally, the number does not not affect or have any direct effect with log pinning unless one volume goes 'wild'.
    Last edited by moon-buddy; 07-05-2012 at 12:53 PM.
    Ed

  17. #17
    Senior Member
    Join Date
    Dec 2004
    Location
    NC
    Posts
    200
    Thanks
    0
    Thanked 11 Times in 11 Posts

    Default

    If you are not sure it is the Exchange client, or just want proof, consider a shell script that will perform a "sh logpin" command every 5 minutes or so. That script should also show the log utilization since the log will be pinned often, but you only need to concern yourself when the %util starts to rise quickly.

    Doing so in my environment has over the years helped to quickly pinpoint problem nodes or processes. Here is the guts of my script, suitably adjusted. It actually queries all servers in our environment, hence the outer do loop

    USER=<admin account>
    PASSWD=<Admin Password>
    HISTFILE=<path and file name of output file>
    cat SERVERLIST.FILE | while read INST
    do
    dsmadmc -id=${USER} -pa=${PASSWD} -dataonly=yes -se=${INST} \
    "show logpin" | grep "Type=Node" | sed 's/Session //' | \
    sed 's/Type=Node,//' | sed 's/Id=//' | tr -d ":" | read LINE2
    SESS=$(echo $LINE2 | awk '{print $2}')
    NODE=$(echo $LINE2 | awk '{print $3}')
    ## Note the spaces in the tr command is a SPACE and a TAB
    UTIL=$(dsmadmc -id=${USER} -pa=${PASSWD} -dataonly=yes -tabdelim -se=${INST} \
    "select LOG_POOL_PCT_UTIL from log" | tr -d " ")
    if [[ -n ${NODE} ]]
    then
    # tab delimited output, only if a node was actually discerned
    echo "$INST $(date '+%Y%m%d') $(date '+%H:%M') $SESS $NODE $UTIL" >> $HISTFILE
    fi
    dsmadmc -id=${USER} -pa=${PASSWD} -dataonly=yes -se=${INST} \
    "show logpin" | grep -E "descr=|procNum" | paste - - | cut -d"=" -f2,5 \
    | cut -d"," -f1,2 | sed 's/status=//' | while read LINE3
    do
    if [[ -n ${LINE3} ]]
    then
    ## Note the spaces in the sed command is a TAB, and the other output is tab delim
    echo "$INST $(date '+%Y%m%d') $(date '+%H:%M') $LINE3 $UTIL" | sed 's/, / /' >> $HISTFILE
    fi
    done
    done
    "If we knew what it was we were doing, it would not be called research, would it?" -- Albert Einstein

  18. #18
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by moon-buddy View Post
    This is where the log pin may be happening - it takes a long time to backup Exchaage directly to disk. Can you not move this to disk pool first?
    Unfortunately no due to the sheer size. Also, it's not being run every night and we're still seeing the pinning.


    Quote Originally Posted by moon-buddy View Post
    How big are the volumes? Generally, the number does not not affect or have any direct effect with log pinning unless one volume goes 'wild'.
    All but two are 10 GB.

  19. #19
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Thanks rmazzon, I'll give it a shot.

  20. #20
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    Unfortunately no due to the sheer size. Also, it's not being run every night and we're still seeing the pinning.
    All but two are 10 GB.
    Ok, so the Exchange backups are not run daily.

    1) Have you checked the health of your network?
    2) Ask an AIX admin to checkout the TSM host server. Check for local disk issues with the disk itself and/or the SCSI interface. Again, errrpt will assist on this
    3) Look for a node that stays on wait during backup and see what is causing this to wait

    I presume you had checked out the SAN environment based on your last post. Have you done benchmarks on the SAN?
    Last edited by moon-buddy; 07-05-2012 at 02:23 PM.
    Ed

  21. #21
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by moon-buddy View Post
    Ok, so the Exchange backups are not run daily.

    1) Have you checked the health of your network?
    2) Ask an AIX admin to checkout he TSM host server. Check for local disk issues with the disk itself and/or the SCSI interface. Again, errrpt will assist on this
    3) Look for a node that stays on wait during backup and see what is causing this to wait

    I presume you had checked out the SAN environment based on your last post. Have you done benchmarks on the SAN?
    1 > Adding more network bandwidth to the server today/tomorrow
    2 > Will do. Just noticed that the last entry in the errpt is "ERROR LOGGING TURNED OFF"...not sure why that would be. Know how to re-enable that?
    3 > Will do.

    Correct, SAN is checking out fine at the moment. Having them check for errors again just to be safe.

  22. #22
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    1 > 2> Will do. Just noticed that the last entry in the errpt is "ERROR LOGGING TURNED OFF"...not sure why that would be. Know how to re-enable that?
    How are you accessing the ERRPT? Root ERRPT messages are different than other user's messages. ERRPT is always ON. This must be from something else. Error logging turned OFF might from be an application or from TSM.
    Ed

  23. #23
    Member dangel42's Avatar
    Join Date
    May 2007
    Location
    Wisconsin
    Posts
    69
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by moon-buddy View Post
    How are you accessing the ERRPT? Root ERRPT messages are different than other user's messages. ERRPT is always ON. This must be from something else. Error logging turned OFF might from be an application or from TSM.
    Running the command 'errpt' as root. And I have no new entries since 6/13.

  24. #24
    Moderator moon-buddy's Avatar
    Join Date
    Aug 2005
    Location
    Somewhere in the US
    Posts
    5,296
    Thanks
    2
    Thanked 137 Times in 135 Posts

    Default

    Quote Originally Posted by dangel42 View Post
    Running the command 'errpt' as root. And I have no new entries since 6/13.
    Look here for starting ERROR Logging: http://publib.boulder.ibm.com/infoce...errlogtsks.htm
    Ed

Page 1 of 2 12 LastLast

Similar Threads

  1. Log pinned
    By oguinna in forum Performance Tuning
    Replies: 4
    Last Post: 07-29-2011, 04:27 PM
  2. TSM 6.1 and recovery log
    By sdnie in forum TSM Server
    Replies: 5
    Last Post: 01-18-2010, 12:22 PM
  3. Please help with TSM recovery log
    By dreamz in forum Performance Tuning
    Replies: 10
    Last Post: 10-21-2009, 05:14 AM
  4. Recovery Logs overcommited, cannot extend new recovery log.
    By kenlee in forum Administrative Client
    Replies: 8
    Last Post: 01-17-2007, 11:18 PM
  5. Recovery Log Pinned.
    By cz8hwh in forum Performance Tuning
    Replies: 0
    Last Post: 10-07-2005, 07:25 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •