Veritas-bu

[Veritas-bu] Monitoring for Locked jobs

2006-09-27 10:47:27
Subject: [Veritas-bu] Monitoring for Locked jobs
From: Jonathan.Dyck at cognos.com (Dyck, Jonathan)
Date: Wed, 27 Sep 2006 10:47:27 -0400
I hear ya Phil.  Can't count the number of times @ 2AM I've cursed doing
my "due diligence" on data collection in the event of a failure, but
sometimes it even helps :)
 
Here's the quote straight from my support (I asked them the same
question awhile ago) on the bpdbjobs command, you can customize to your
own needs with BPDBJOB_COLDEFS...  
 
I run all my scripts on a Solaris box,  maybe someone else on this DL
can fill you in  on the Windows equivalent...
 
Cheers,
Jon
 
==================
Yes, that's correct. If you want to have the "Operation" column
displayed when executing "bpdbjobs" command then you can add the setting
"BPDBJOB_COLDEFS = Operation 12 true" in the bp.conf. However, you will
only get the output for Operation column when you execute the "bpdbjobs"
command. To have specific columns output for "bpdbjobs", you'd need to
add multiple BPDBJOB_COLDEFS entries in the bp.conf.

For detailed use of BPDBJOB_COLDEFS, please refer to the technote
http://support.veritas.com/docs/266314
<http://support.veritas.com/docs/266314> 

If you don't want to use the BPDBJOB_COLDEFS settings in the bp.conf you
can reference the text file attached, which deciphering all the
fields/columns for different outputs of "bpdbjobs" command. You can then
use it in the script to spit out whatever fields/columns you want.

 ==================

________________________________

From: "Koster, Phil" [mailto:pkoster at ci.grand-rapids.mi.us] 
Sent: Wednesday, September 27, 2006 10:12 AM
To: Dyck, Jonathan; veritas-bu
Subject: RE: [Veritas-bu] Monitoring for Locked jobs


According to the report logs we got a "bad image header" and a "could
not build host list" error right around the same time it locked up.  Not
to say it wasn't a mounting problem but our primary interest was getting
things running again so we did not take enough care in documenting the
state we found the server in.  
 
What status's will a bpdbjobs show?  When I look in the Windows Command
reference I only see: 

field3 = state (0=queued, 1=active, 2=waiting for retry, 3=done)

The rest don't provide definitions with the potential responses (like a
-all_columns or a -most_columns).  I suppose we could do a bpdbjobs
-all_columns and look for a state (field 3) of 1 or 2 and then look at
the elapsed time (field 10).  But is there a better field then the
"state" field or is state just as useful for this as any other field?

In 20/20 hindsight we would have been better off with a forensic quality
investigation instead of a rapid recovery style response.  It was one of
those split second decisions based on the fear of the status of our back
ups.  (Back ups are high profile around here.  Last year we had a NAS
device double fault on a RAID 5 and the back ups were bad, one week and
$36,000 later we had only recovered about 60% of the ~500 GB of data.
Ever since then....).  

Thanks.

Phil Koster
Network Administrator
City of Grand Rapids
Direct: 616-456-3136
Helpdesk: 456-3999

________________________________

From: Dyck, Jonathan [mailto:Jonathan.Dyck at cognos.com] 
Sent: Wednesday, September 27, 2006 9:08 AM
To: Koster, Phil; veritas-bu
Subject: RE: [Veritas-bu] Monitoring for Locked jobs



Phil,
I run into this kind of thing once in awhile, although I typically see
hung jobs in a "mounting" state if there's a problem, and for some
reason my media mount timeout doesn't kill the job.  Did you experience
something similar?
 
One way to go about scripting a job would be to "bpdbjobs -report" and
grep on "Mounting".  You can get a elapsed time from the report too, and
using that time, you should be able to page out based on your threshold.
 
Just my thoughts...
 
Cheers,
Jon
 


________________________________

From: veritas-bu-bounces at mailman.eng.auburn.edu
[mailto:veritas-bu-bounces at mailman.eng.auburn.edu] On Behalf Of "Koster,
Phil"
Sent: Tuesday, September 26, 2006 7:55 AM
To: veritas-bu
Subject: [Veritas-bu] Monitoring for Locked jobs


NBU 6 MP2 on Win2K Srvr.

Over the weekend, all our jobs locked up on the back up server
essentially stopping all backups beginning around 9 AM Saturday. All NBU
processes (Windows) continued to run normally which did not allow our
Nagios Monitoring system to notify us (it all seemed just fine). So my
question is, is there anything we can monitor, even if by manual script,
to let us know when jobs get hung?


Can we do like a bpdbjobs and see how long the jobs have been running
and use a script to evaluate that information? Anyone doing something
like that already?


What I am thinking is at the very least just get something that can
monitor the jobs automatically (like a once per hour check) and if the
jobs get over x hours active then it sends some e-mails out to our cell
phones and inboxes. I tried to play around with bpdbjobs last night but
ran out of time before our back ups finished. (Took to long getting the
kids to bed ;-)


Thanks.

Phil Koster
Network Administrator
City of Grand Rapids
Direct: 616-456-3136
Helpdesk: 456-3999

************************************************************************
***************************** This message has been prepared on
resources owned by the City of Grand Rapids, MI. It is subject to the
Acceptable Use Policy and Procedures of the City of Grand Rapids. The
information contained herein is confidential and is intended solely for
the addressee. Access by any other party is unauthorized without the
express written permission of the sender. If you are not the intended
recipient, please contact the sender and delete this message.
************************************************************************
***************************** 
 
     This message may contain privileged and/or confidential
information.  If you have received this e-mail in error or are not the
intended recipient, you may not use, copy, disseminate or distribute it;
do not open any attachments, delete it immediately from your system and
notify the sender promptly by e-mail that you have done so.  Thank you. 

************************************************************************
***************************** This message has been prepared on
resources owned by the City of Grand Rapids, MI. It is subject to the
Acceptable Use Policy and Procedures of the City of Grand Rapids. The
information contained herein is confidential and is intended solely for
the addressee. Access by any other party is unauthorized without the
express written permission of the sender. If you are not the intended
recipient, please contact the sender and delete this message.
************************************************************************
***************************** 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.eng.auburn.edu/pipermail/veritas-bu/attachments/20060927/5a52c2bb/attachment.html