Veritas-bu

Re: [Veritas-bu] Serious bug w/ 6.5.2 (and beyond?): perpetually requeueing jobs, starting this weekend.

2008-11-03 15:11:06
Subject: Re: [Veritas-bu] Serious bug w/ 6.5.2 (and beyond?): perpetually requeueing jobs, starting this weekend.
From: "Rosenkoetter, Gabriel" <Gabriel.Rosenkoetter AT radian DOT biz>
To: "veritas-bu AT mailman.eng.auburn DOT edu" <veritas-bu AT mailman.eng.auburn DOT edu>
Date: Mon, 3 Nov 2008 14:39:21 -0500
I retract my hand-wavy stuff about locales.

I've been informed out-of-band (and others have posted here) that issue stopped 
when the relevant backup window closed. I didn't see that because I simply had 
our NetBackup environments shut down. (Affected policies included a 
bpstart_notify that shuts down Oracle databases for a cold backup. Doing that 
repeatedly and indefinitely was clearly Not Okay.)


--
gabriel rosenkoetter
Radian Group Inc, Unix/Linux/VMware Sysadmin / Backup & Recovery
gabriel.rosenkoetter AT radian DOT biz, 215 231 1556


-----Original Message-----
From: Rosenkoetter, Gabriel
Sent: Monday, November 03, 2008 10:20 AM
To: Rosenkoetter, Gabriel; 'Bryan S. Leaman'
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] Serious bug w/ 6.5.2 (and beyond?): perpetually 
requeueing jobs, starting this weekend.

Just for the record, updating to 6.5.2A does not appear to be related (but it 
doesn't hurt).

This does appear to be fallout from the changes, described in technote 301752, 
made in 6.5.2 to avoid the problem that Bluejay Adametz described with 6.5.1 (a 
single repeated run with policies spanning a DST change). I wouldn't exactly 
call those changes a ringing success. I don't really have enough statistical 
data as yet, but it appears that those changes essentially make PEM try to 
intuit what you might have actually meant, rather than doing strictly what you 
said in your schedules, for the span of time when your locale *might* be in a 
DST change. My friends at a much larger organization tell me that the behavior 
for them stopped around 09:00 or 10:00 (am) Sunday (Eastern). I'm not 100% 
certain when the problems started for us on Saturday, but that looks 
suspiciously like the 24 hour period when those using a North America/US locale 
might be in the a DST change.

I'm opening a followup case with Symantec shortly.


--
gabriel rosenkoetter
Radian Group Inc, Unix/Linux/VMware Sysadmin / Backup & Recovery
gabriel.rosenkoetter AT radian DOT biz, 215 231 1556


-----Original Message-----
From: Rosenkoetter, Gabriel [mailto:Gabriel.Rosenkoetter AT radian DOT biz]
Sent: Sunday, November 02, 2008 7:42 PM
To: 'Bryan S. Leaman'
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Serious bug w/ 6.5.2 (and beyond?): perpetually 
requeueing jobs, starting this weekend.

Yeah, that's what I've gathered from our systems and those of several friends 
elsewhere.

The restarting repeatedly smells like an nbpem bug. Quite plausibly a bug in 
the code added along these lines: 
http://seer.entsupport.symantec.com/docs/301752.htm , especially given the 
proximity to a DST change (although this occured for us well prior to the US 
Eastern changover).

A symptom I maybe hadn't mentioned is that one of our environments requeued 
jobs even after the policy had been marked inactive and NetBackup restarted, to 
which a former co-worker suggested that I inspect the output of nbpemreq 
-tables screen (on the premise that the requeueing was stuck in that DB), which 
would be a great suggestion if that flag hadn't been removed between 6.5 GA and 
6.5.2. I want to think that nbpemreq -persisted is somewhat related, but 
significantly less verbose and clear.

I've updated our environments to 6.5.2A and left a non-prod one running for a 
while now, and things appear to be okay for now. But, well, "more later". (No, 
I haven't opened a case yet, because I don't have enough information yet to be 
assertive about the issue. If anyone who's experienced this has, please drop me 
a line privately so that we can help Symantas correlate information.)

--
gabriel rosenkoetter
Radian Group Inc, Unix/Linux/VMware Sysadmin / Backup & Recovery
gabriel.rosenkoetter AT radian DOT biz, 215 231 1556


-----Original Message-----
From: Bryan S. Leaman [mailto:leaman AT bitbytes DOT com]
Sent: Sunday, November 02, 2008 6:59 PM
To: Rosenkoetter, Gabriel
Cc: veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] Serious bug w/ 6.5.2 (and beyond?): perpetually 
requeueing jobs, starting this weekend.

I saw the same behavior with 6.5.2 on Solaris (with the rev4 nbpem EEB
installed).  After the DST changeover, several of our schedules also
requeued immediately after completing and ran several times.  Luckily they
stopped requeueing later in the morning, possibly because the schedules do
not have a 24hr window.  I didn't have to restart NBU, and the future job
forecast looks normal.

I'm not sure about the "fulls instead of incrementals" issue being
related.  As I understand that one, it only affects a policy the first
time it runs after 6.5.2 is applied.

Bryan

> We encountered a relatively serious problem with our (HP-UX, NBU 6.5.2)
> environments this weekend.
>
> Scheduled backups ran, as expected, Friday and Saturday evenings. Then,
> after finishing successfully, new, identical jobs requeued and ran again.
> And again. So forth until we shut NBU down completely.
>
> I'm now in the process of installing 6.5.2A (we hadn't bothered because we
> hadn't encountered any known 6.5.2 bugs, except for the initial "full
> instead of incremental" one, and we were past that), starting things back
> up, and hoping that the fix for the "fulls instead of incrementals" bug
> will fix this as well. I'll report that when I know.
>
> We have one site that's still 6.5 GA and it was NOT affected. If you're
> running anything more recent, you SHOULD check your systems right now.
> They're probably still running backups over and over. I say this because I
> just checked with my friends at a former employer, a major financial
> institution whose name I won't mention because I know their policies, and
> they're also affected in environments with Solaris, AIX (maybe?), and
> Windows masters. You are probably also affected. Stop reading this and go
> check. Now.
>
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>








_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu