Results 1 to 24 of 25
-
06-30-2012, 09:34 PM #1
Recovery log getting pinned nightly, TSM 5.5.6
Pretty much at a loss here. I think I've tried just about everything but obviously haven't hit the nail on the head yet.
TSM 5.5.6 on AIX > upgrade to 6.x is planned for this year
DB size is ~100GB running in normal mode > rollforward was causing DB backups to run constantly
- show logpinned does state that I have dirty buffer pool pages
- trying to kick off occasional incr db backups has no effect as they can't even get started the system is so locked up
- running the flush command does seem to do it's job, but obviously not well enough to avoid the system from crashing
- consultant says it's possible that I don't have enough network bandwidth available on the server itself(two bonded 1Gb links) > another dual-port card is on it's way to make it 4
- I've split up the backup jobs so that not all of my servers are backed up every night...has had no lasting effect
Any ideas would be greatly appreciated.
-
06-30-2012, 10:31 PM #2Moderator
- Join Date
- Aug 2005
- Location
- Somewhere in the US
- Posts
- 5,296
- Thanks
- 2
- Thanked 137 Times in 135 Posts
How many nodes are you backing up on this TSM server?
One main cause of the logs getting full is a node or nodes has so many files to back up and the primary pool (I hope you have disk pools) is full (cache enabled taking time to migrate) or not big enough to hold the backup.
If you have disk pools, and you are sure it is really big to hold the backup data, check for performance issues. What does AIX 'errpt' show? If your disk pool is on SAN, checkout the LUNs.
If you have a lot of nodes to backup you may want to split your TSM server.Last edited by moon-buddy; 06-30-2012 at 10:50 PM.
Ed
-
07-01-2012, 07:26 AM #3
500+ nodes
Primary pool for the great majority of these systems is disk and it does have room for a night's worth of incrementals at least when split up across different nights.
Disk pool is on a SAN, errpt does mention occasionally "PATH HAS FAILED" for some of the LUNs.
-
07-01-2012, 10:54 AM #4Moderator
- Join Date
- Aug 2005
- Location
- Somewhere in the US
- Posts
- 5,296
- Thanks
- 2
- Thanked 137 Times in 135 Posts
500 is what I still consider a low medium environment. I had an environment with 480 nodes and I can complete a nights' backup within the 10 hour window. I only had 7 3592 tape drives, 8 TB of disk pool and 6 HBA split between the SAN and tape drives - 2 for SAN and 4 for tape drives. The HBA setup is far from ideal as it is required to have 1 HBA per two drives. The environment is a p520 (4 CPU) with 4 GB of RAM, 2 fiber attached TCP/IP ports at 1 GB and an internal disk of 180 GB for the TSM DB and OS. AIX is 6.1 and TSM version is 5.5.6
This seems to be your problem. Troubleshoot the path failed errors. I have encountered this and had the log pinned for two days.Disk pool is on a SAN, errpt does mention occasionally "PATH HAS FAILED" for some of the LUNs.
Correction: I had only 4 HBAs: 2 were for SAN and 2 for the tape drives.6 HBA split between the SAN and tape drives - 2 for SAN and 4 for tape drives.Last edited by moon-buddy; 07-02-2012 at 01:23 PM.
Ed
-
07-01-2012, 06:21 PM #5
I'll check out the path failed errors, thanks.
-
07-02-2012, 12:33 PM #6
Do you recall what the resolution was for your path failed errors?
-
07-02-2012, 01:19 PM #7Moderator
- Join Date
- Aug 2005
- Location
- Somewhere in the US
- Posts
- 5,296
- Thanks
- 2
- Thanked 137 Times in 135 Posts
That was a SAN fabric issue.
See if you might have a disk issue on the SAN.Ed
-
07-02-2012, 04:10 PM #8
Ended up using the third post from this link...
http://groups.google.com/group/comp....700914d47e34a5
Hopefully this helps us.
-
07-04-2012, 02:17 PM #9
Fixing the failed paths has not fixed my problem, unfortunately.
-
07-04-2012, 06:10 PM #10Senior Member
- Join Date
- Apr 2005
- Location
- Michigan
- Posts
- 1,359
- Thanks
- 0
- Thanked 1 Time in 1 Post
Here is an idea as we have had a bit of the same problem with our TSM DB running on local disk and we then moved it to the SAN and disabled mirroring.
Since your DB is on the SAN already..... have your SAN Administrator create new LUNs on a different channel. From here - one DBvolume at a time - mirror that volume to the new LUN.
Once sync'd - disable the primary volume and force the movement to the new LUN - Run a DB backup. Repeat until you find your LUN(disk) with the dirty pages.
Reformat that LUN and bring that back into the picture as your DBcopy. Or leave mirroring off as the SAN will take over from there in regards to disk managment - failed disk hot spares all that jazz.
Next - on your HBAs - if they are dual ported - or quads - run one HBA for disk - One HBA for tape - leaving multi-pathing on the HBA/port switch. This also helps in troubleshooting failing switch ports. Quads are usually best as two ports are readily available at all times for hot fixes/swaps.
Hope this helps
-
07-05-2012, 10:18 AM #11
Actually the DB is mirrored on local disk. Primary pool for the majority of the systems is SAN-based.
-
07-05-2012, 10:29 AM #12
-
07-05-2012, 11:47 AM #13
Server has not been rebooted recently. Local disk appears to be fine, however my AIX knowledge is a bit limited. I do know the nodes that send large data(mostly Exchange and SQL flat files as we don't use TDP agents).
-
07-05-2012, 11:54 AM #14
-
07-05-2012, 12:23 PM #15
The largest of the bunch goes straight to tape(exchange), the others go to a diskpool large enough to handle a night's worth of backups...a scheduled migration to tape is then run later on outside of the backup window.
We don't have an AIX admin on staff, but I'll see what I can do. Our consultant's firm has AIX admins.
One other thing to note, I currently have 14 DB volumes...would it help to attempt to reduce that number and/or move those DB volumes off of local disk to the SAN, if possible?
-
07-05-2012, 12:33 PM #16Moderator
- Join Date
- Aug 2005
- Location
- Somewhere in the US
- Posts
- 5,296
- Thanks
- 2
- Thanked 137 Times in 135 Posts
This is where the log pin may be happening - it takes a long time to backup Exchaage directly to disk. Can you not move this to disk pool first?
How big are the volumes? Generally, the number does not not affect or have any direct effect with log pinning unless one volume goes 'wild'.We don't have an AIX admin on staff, but I'll see what I can do. Our consultant's firm has AIX admins.
One other thing to note, I currently have 14 DB volumes...would it help to attempt to reduce that number and/or move those DB volumes off of local disk to the SAN, if possible?Last edited by moon-buddy; 07-05-2012 at 12:53 PM.
Ed
-
07-05-2012, 12:50 PM #17Senior Member
- Join Date
- Dec 2004
- Location
- NC
- Posts
- 200
- Thanks
- 0
- Thanked 11 Times in 11 Posts
If you are not sure it is the Exchange client, or just want proof, consider a shell script that will perform a "sh logpin" command every 5 minutes or so. That script should also show the log utilization since the log will be pinned often, but you only need to concern yourself when the %util starts to rise quickly.
Doing so in my environment has over the years helped to quickly pinpoint problem nodes or processes. Here is the guts of my script, suitably adjusted. It actually queries all servers in our environment, hence the outer do loop
USER=<admin account>
PASSWD=<Admin Password>
HISTFILE=<path and file name of output file>
cat SERVERLIST.FILE | while read INST
do
dsmadmc -id=${USER} -pa=${PASSWD} -dataonly=yes -se=${INST} \
"show logpin" | grep "Type=Node" | sed 's/Session //' | \
sed 's/Type=Node,//' | sed 's/Id=//' | tr -d ":" | read LINE2
SESS=$(echo $LINE2 | awk '{print $2}')
NODE=$(echo $LINE2 | awk '{print $3}')
## Note the spaces in the tr command is a SPACE and a TAB
UTIL=$(dsmadmc -id=${USER} -pa=${PASSWD} -dataonly=yes -tabdelim -se=${INST} \
"select LOG_POOL_PCT_UTIL from log" | tr -d " ")
if [[ -n ${NODE} ]]
then
# tab delimited output, only if a node was actually discerned
echo "$INST $(date '+%Y%m%d') $(date '+%H:%M') $SESS $NODE $UTIL" >> $HISTFILE
fi
dsmadmc -id=${USER} -pa=${PASSWD} -dataonly=yes -se=${INST} \
"show logpin" | grep -E "descr=|procNum" | paste - - | cut -d"=" -f2,5 \
| cut -d"," -f1,2 | sed 's/status=//' | while read LINE3
do
if [[ -n ${LINE3} ]]
then
## Note the spaces in the sed command is a TAB, and the other output is tab delim
echo "$INST $(date '+%Y%m%d') $(date '+%H:%M') $LINE3 $UTIL" | sed 's/, / /' >> $HISTFILE
fi
done
done"If we knew what it was we were doing, it would not be called research, would it?" -- Albert Einstein
-
07-05-2012, 01:05 PM #18
-
07-05-2012, 01:14 PM #19
Thanks rmazzon, I'll give it a shot.
-
07-05-2012, 01:16 PM #20Moderator
- Join Date
- Aug 2005
- Location
- Somewhere in the US
- Posts
- 5,296
- Thanks
- 2
- Thanked 137 Times in 135 Posts
Ok, so the Exchange backups are not run daily.
1) Have you checked the health of your network?
2) Ask an AIX admin to checkout the TSM host server. Check for local disk issues with the disk itself and/or the SCSI interface. Again, errrpt will assist on this
3) Look for a node that stays on wait during backup and see what is causing this to wait
I presume you had checked out the SAN environment based on your last post. Have you done benchmarks on the SAN?Last edited by moon-buddy; 07-05-2012 at 02:23 PM.
Ed
-
07-05-2012, 02:22 PM #21
1 > Adding more network bandwidth to the server today/tomorrow
2 > Will do. Just noticed that the last entry in the errpt is "ERROR LOGGING TURNED OFF"...not sure why that would be. Know how to re-enable that?
3 > Will do.
Correct, SAN is checking out fine at the moment. Having them check for errors again just to be safe.
-
07-05-2012, 02:26 PM #22
-
07-05-2012, 02:34 PM #23
-
07-05-2012, 02:53 PM #24Moderator
- Join Date
- Aug 2005
- Location
- Somewhere in the US
- Posts
- 5,296
- Thanks
- 2
- Thanked 137 Times in 135 Posts
Look here for starting ERROR Logging: http://publib.boulder.ibm.com/infoce...errlogtsks.htm
Ed
Similar Threads
-
Log pinned
By oguinna in forum Performance TuningReplies: 4Last Post: 07-29-2011, 04:27 PM -
TSM 6.1 and recovery log
By sdnie in forum TSM ServerReplies: 5Last Post: 01-18-2010, 12:22 PM -
Please help with TSM recovery log
By dreamz in forum Performance TuningReplies: 10Last Post: 10-21-2009, 05:14 AM -
Recovery Logs overcommited, cannot extend new recovery log.
By kenlee in forum Administrative ClientReplies: 8Last Post: 01-17-2007, 11:18 PM -
Recovery Log Pinned.
By cz8hwh in forum Performance TuningReplies: 0Last Post: 10-07-2005, 07:25 AM


Reply With Quote
