Veritas-bu

[Veritas-bu] Jobs stuck in the queue?

2005-09-30 14:30:09
Subject: [Veritas-bu] Jobs stuck in the queue?
From: jpiszcz AT servervault DOT com (Piszcz, Justin)
Date: Fri, 30 Sep 2005 14:30:09 -0400
My problem is they do not cancel, they only error with (50) after all of
the other jobs complete.


-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of
Mark.Donaldson AT cexp DOT com
Sent: Friday, September 30, 2005 2:04 PM
To: veritas-bu AT mailman.eng.auburn DOT edu
Cc: Greg.Hindle AT constellation DOT com
Subject: RE: [Veritas-bu] Jobs stuck in the queue?

I had a request to post my fix to auto-cancel these System_State jobs so
here it is below.  Note, this works off elapsed time of the job, which
starts counting when the job queues.  I'd rather work off the "attempt
elapsed" time but bpdbjobs doesn't seem to kick out that number in any
useful way.

As written, it cancels any active /System_State/ job with an elapsed
time
over 24 hours.  Change the LOG file path to fit your environment.  I
stuck
mine in cron on a six-hour cycle.

==== Script start ====

#!/bin/ksh

# Kills stuck NT System_State backups.
# Mark Donaldson - 09/30/2005

# Max elapsed time for System_State backup (seconds)
maxtime=86400

PATH=$PATH:/usr/openv/netbackup/bin/admincmd
PROG=`basename $0`
LOG=/usr/openv/netbackup/logs/scripts/$PROG.log
TMP=/tmp/$PROG.tmp

# Logfile Management
exec >>$LOG 2>&1
if [ `wc -l $LOG | awk '{print $1}'` -gt 2000 ]
then
  cp $LOG $TMP
  echo "Logfile Truncated: `date`" >$LOG
  tail -1000 $TMP >>$LOG
  rm -f $TMP
fi

## Main
for jobid in ` bpdbjobs -most_columns | \
    awk -F, '$2==0 && $3==1 && $10>'$maxtime' \
    && $17~/^\/System_State\// && $22==13 {print $1}'`
do
  echo "Cancelling $jobid :: `date`"
  bpdbjobs -cancel $jobid
done
exit





-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of
Mark.Donaldson AT cexp DOT com
Sent: Thursday, September 29, 2005 9:08 AM
To: jpiszcz AT servervault DOT com; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Jobs stuck in the queue?


I'm getting the \system state\ ones stuck frequently but a "cancel" from
the
gui clears them.  If I check the details screen on the GUI, it show the
tape
moving from fragment to fragment but no data moving.

I'm getting ready to write something that looks for these & kill them if
they've been running more than 12 hours or so.  

I'm waiting for another one so I can grab the bpdbjobs entry.

-M
-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of Piszcz,
Justin
Sent: Thursday, September 29, 2005 8:41 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Jobs stuck in the queue?


When using ALL_LOCAL_DRIVES (w/ NEW_STREAM) under it, sometimes I get
some
jobs (mainly System_State:\ or Shadow_Copy_Components:\) that get stuck
in
the queue.
 
There is no way to delete them except stop and start the NB server
processes.
 
Anyone ever experience anything like that before?
 
Using 5.1mp3a on Solaris 8.
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


<Prev in Thread] Current Thread [Next in Thread>