Veritas-bu

[Veritas-bu] How to prevent NBU from immediately using a medi a that failed before

2005-11-02 00:03:10
Subject: [Veritas-bu] How to prevent NBU from immediately using a medi a that failed before
From: Ray.Hill AT ny.frb DOT org (Ray.Hill AT ny.frb DOT org)
Date: Wed, 2 Nov 2005 00:03:10 -0500
--=_mixed 001BC04B852570AD_=
Content-Type: multipart/alternative; boundary="=_alternative 001BC04B852570AD_="


--=_alternative 001BC04B852570AD_=
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

Thank you very much.
The script works great.




Ray H.
ext 8527
*****************************************************************
Communication Is the Gateway between Ideas and Results
*****************************************************************



Mark.Donaldson AT cexp DOT com=20
Sent by: veritas-bu-admin AT mailman.eng.auburn DOT edu
10/31/2005 11:44 AM

To
netbacker AT gmail DOT com
cc
veritas-bu AT mailman.eng.auburn DOT edu
Subject
RE: [Veritas-bu] How to prevent NBU from immediately using a medi        a =

that failed before






It's small so I'll jsut attach for group.  Change email & "THOLD" variable
at top to suit your environment.
-M



-----Original Message-----
From: Sto Rage=A9 [mailto:netbacker AT gmail DOT com]
Sent: Friday, October 28, 2005 6:47 PM
To: Mark.Donaldson AT cexp DOT com
Cc: ida3248b AT post.cybercity DOT dk; veritas-bu AT mailman.eng.auburn DOT edu
Subject: Re: [Veritas-bu] How to prevent NBU from immediately using a
medi a that failed before


Thanks to all that replied. Looking at the issues we have been having,
I think setting
MEDIA=5FERROR=5FTHRESHOLD to 0 is the best option for us, i.e. freezing
the tape immediately.  We can then investigae the forzen tapes later
to see what indeed was the issue and unfreeze the media and reuse it
if needed. (Mark, would you mind send us the script you mentioned?)
We would like to freeze the tape the first time so that NBU doesn't
waste time using  the same tape  for the next 4 or 5 jobs in the
queue. Last time this happened, we lost lmore than 8 hours of backup
time. The fault on that tape was somewhere at the end, where it failed
to seek. So each job that failed wrote anywhere from 85GB to 100GB on
that tape before it failed (LTO-1 media).


-G

On 10/28/05, Mark.Donaldson AT cexp DOT com <Mark.Donaldson AT cexp DOT com> 
wrote:
> Frozen, though, isn't necessarily mean broken.  A media fault is=20
possible
> but then there's the drive faults too, loader error, sunspots, plague.
>
> I've got a script that sweeps the frozen tapes, keeps a count, and
unfreezes
> them if there hasn't been enough failures.  Any tape that freezes over 3
> times stays frozen.  I may be a method you could adapt.
>
> -M
>
> -----Original Message-----
> From: veritas-bu-admin AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of
> ida3248b AT post.cybercity DOT dk
> Sent: Friday, October 28, 2005 2:28 AM
> To: Sto Rage(c); Veritas NBU Mailing List (E-mail)
> Subject: Re: [Veritas-bu] How to prevent NBU from immediately using a
> media that failed before
>
>
> Hi G
>
> You can under INSTALLPATH/netbackup created the files
>
> MEDIA=5FERROR=5FTHRESHOLD number of allowed errors
>
> TIME=5FWINDOW in which number of errors occurs (number of hours)
>
> If you put 0 the first file, the tape should get frozen at the first=20
error
>
> Regards
> Michael
>
> On Thu, 27 Oct 2005 11:11:11 -0700, Sto Rage(c) wrote
> > Hi,
> >   Here's my problem, a backup job writes to a media and then fails
> > with write error/position error etc. The job then gets re-queued and
> > runs again, then NBU uses this very same tape and writes and fails
> > again, this happens till the max retires of the job is exceeded and
> > then the job fails.
> > Why does it reuse the same tape again and again for the same
> > job/policy? Is there a counter that we can set to prevent NBU from
> > retrying a media that errors out the first time?
> > The logs below from bptm show the media ID 001956 being repeatedly=20
used.
> >
> > 02:01:58.703 [5842] <2> log=5Fmedia=5Ferror: successfully wrote to error
> > file - 10/27/05 02:01:58 001956 13 POSITION=5FERROR
> > 02:29:33.454 [21029] <2> log=5Fmedia=5Ferror: successfully wrote to err=
or
> > file - 10/27/05 02:29:33 001956 13 POSITION=5FERROR
> > 03:19:20.128 [22766] <2> log=5Fmedia=5Ferror: successfully wrote to err=
or
> > file - 10/27/05 03:19:20 001956 13 POSITION=5FERROR
> > 04:30:34.394 [25958] <2> log=5Fmedia=5Ferror: successfully wrote to err=
or
> > file - 10/27/05 04:30:34 001956 13 POSITION=5FERROR
> >
> >   Ironically, the 5th time it successfully wrote to this tape and
> > continued with the job.
> > We run huge NDMP jobs (average size of each is 2 TB) so when this
> > happens say 70% into a job, NBU has to start from the beginning,
> > sadly checkpoint restart is not an option for NDMP backups. Is this
> > available in 6.0?
> >
> > -G
> >
> > =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
> > Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>
>
> --
> Cybercity Webhosting (http://www.cybercity.dk)
>
> =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>



--=_alternative 001BC04B852570AD_=
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable


<br><font size=3D2 face=3D"sans-serif">Thank you very much.</font>
<br><font size=3D2 face=3D"sans-serif">The script works great.<br>
<br>
<br>
<br>
<br>
Ray H.<br>
ext 8527<br>
*****************************************************************<br>
Communication Is the Gateway between Ideas and Results<br>
*****************************************************************</font>
<br>
<br>
<br>
<table width=3D100%>
<tr valign=3Dtop>
<td width=3D40%><font size=3D1 face=3D"sans-serif"><b>[email protected]=
om</b>
</font>
<br><font size=3D1 face=3D"sans-serif">Sent by: veritas-bu-admin AT mailman DOT 
en=
g.auburn.edu</font>
<p><font size=3D1 face=3D"sans-serif">10/31/2005 11:44 AM</font>
<td width=3D59%>
<table width=3D100%>
<tr valign=3Dtop>
<td>
<div align=3Dright><font size=3D1 face=3D"sans-serif">To</font></div>
<td><font size=3D1 face=3D"sans-serif">netbacker AT gmail DOT com</font>
<tr valign=3Dtop>
<td>
<div align=3Dright><font size=3D1 face=3D"sans-serif">cc</font></div>
<td><font size=3D1 face=3D"sans-serif">veritas-bu AT mailman.eng.auburn DOT 
edu</f=
ont>
<tr valign=3Dtop>
<td>
<div align=3Dright><font size=3D1 face=3D"sans-serif">Subject</font></div>
<td><font size=3D1 face=3D"sans-serif">RE: [Veritas-bu] How to prevent NBU
from immediately using a medi &nbsp; &nbsp; &nbsp; &nbsp; a that
failed before</font></table>
<br>
<table>
<tr valign=3Dtop>
<td>
<td></table>
<br></table>
<br>
<br>
<br><font size=3D2><tt>It's small so I'll jsut attach for group. &nbsp;Chan=
ge
email &amp; &quot;THOLD&quot; variable<br>
at top to suit your environment.<br>
-M<br>
<br>
<br>
<br>
-----Original Message-----<br>
From: Sto Rage=A9 [mailto:netbacker AT gmail DOT com]<br>
Sent: Friday, October 28, 2005 6:47 PM<br>
To: Mark.Donaldson AT cexp DOT com<br>
Cc: ida3248b AT post.cybercity DOT dk; veritas-bu AT mailman.eng.auburn DOT 
edu<br>
Subject: Re: [Veritas-bu] How to prevent NBU from immediately using a<br>
medi a that failed before<br>
<br>
<br>
Thanks to all that replied. Looking at the issues we have been having,<br>
I think setting<br>
MEDIA=5FERROR=5FTHRESHOLD to 0 is the best option for us, i.e. freezing<br>
the tape immediately. &nbsp;We can then investigae the forzen tapes later<b=
r>
to see what indeed was the issue and unfreeze the media and reuse it<br>
if needed. (Mark, would you mind send us the script you mentioned?)<br>
We would like to freeze the tape the first time so that NBU doesn't<br>
waste time using &nbsp;the same tape &nbsp;for the next 4 or 5 jobs in
the<br>
queue. Last time this happened, we lost lmore than 8 hours of backup<br>
time. The fault on that tape was somewhere at the end, where it failed<br>
to seek. So each job that failed wrote anywhere from 85GB to 100GB on<br>
that tape before it failed (LTO-1 media).<br>
<br>
<br>
-G<br>
<br>
On 10/28/05, Mark.Donaldson AT cexp DOT com &lt;Mark.Donaldson AT cexp DOT 
com&gt; wrote:=
<br>
&gt; Frozen, though, isn't necessarily mean broken. &nbsp;A media fault
is possible<br>
&gt; but then there's the drive faults too, loader error, sunspots, plague.=
<br>
&gt;<br>
&gt; I've got a script that sweeps the frozen tapes, keeps a count, and<br>
unfreezes<br>
&gt; them if there hasn't been enough failures. &nbsp;Any tape that freezes
over 3<br>
&gt; times stays frozen. &nbsp;I may be a method you could adapt.<br>
&gt;<br>
&gt; -M<br>
&gt;<br>
&gt; -----Original Message-----<br>
&gt; From: veritas-bu-admin AT mailman.eng.auburn DOT edu<br>
&gt; [mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of<br>
&gt; ida3248b AT post.cybercity DOT dk<br>
&gt; Sent: Friday, October 28, 2005 2:28 AM<br>
&gt; To: Sto Rage(c); Veritas NBU Mailing List (E-mail)<br>
&gt; Subject: Re: [Veritas-bu] How to prevent NBU from immediately using
a<br>
&gt; media that failed before<br>
&gt;<br>
&gt;<br>
&gt; Hi G<br>
&gt;<br>
&gt; You can under INSTALLPATH/netbackup created the files<br>
&gt;<br>
&gt; MEDIA=5FERROR=5FTHRESHOLD number of allowed errors<br>
&gt;<br>
&gt; TIME=5FWINDOW in which number of errors occurs (number of hours)<br>
&gt;<br>
&gt; If you put 0 the first file, the tape should get frozen at the first
error<br>
&gt;<br>
&gt; Regards<br>
&gt; Michael<br>
&gt;<br>
&gt; On Thu, 27 Oct 2005 11:11:11 -0700, Sto Rage(c) wrote<br>
&gt; &gt; Hi,<br>
&gt; &gt; &nbsp; Here's my problem, a backup job writes to a media and
then fails<br>
&gt; &gt; with write error/position error etc. The job then gets re-queued
and<br>
&gt; &gt; runs again, then NBU uses this very same tape and writes and
fails<br>
&gt; &gt; again, this happens till the max retires of the job is exceeded
and<br>
&gt; &gt; then the job fails.<br>
&gt; &gt; Why does it reuse the same tape again and again for the same<br>
&gt; &gt; job/policy? Is there a counter that we can set to prevent NBU
from<br>
&gt; &gt; retrying a media that errors out the first time?<br>
&gt; &gt; The logs below from bptm show the media ID 001956 being repeatedly
used.<br>
&gt; &gt;<br>
&gt; &gt; 02:01:58.703 [5842] &lt;2&gt; log=5Fmedia=5Ferror: successfully w=
rote
to error<br>
&gt; &gt; file - 10/27/05 02:01:58 001956 13 POSITION=5FERROR<br>
&gt; &gt; 02:29:33.454 [21029] &lt;2&gt; log=5Fmedia=5Ferror: successfully
wrote to error<br>
&gt; &gt; file - 10/27/05 02:29:33 001956 13 POSITION=5FERROR<br>
&gt; &gt; 03:19:20.128 [22766] &lt;2&gt; log=5Fmedia=5Ferror: successfully
wrote to error<br>
&gt; &gt; file - 10/27/05 03:19:20 001956 13 POSITION=5FERROR<br>
&gt; &gt; 04:30:34.394 [25958] &lt;2&gt; log=5Fmedia=5Ferror: successfully
wrote to error<br>
&gt; &gt; file - 10/27/05 04:30:34 001956 13 POSITION=5FERROR<br>
&gt; &gt;<br>
&gt; &gt; &nbsp; Ironically, the 5th time it successfully wrote to this
tape and<br>
&gt; &gt; continued with the job.<br>
&gt; &gt; We run huge NDMP jobs (average size of each is 2 TB) so when
this<br>
&gt; &gt; happens say 70% into a job, NBU has to start from the beginning,<=
br>
&gt; &gt; sadly checkpoint restart is not an option for NDMP backups. Is
this<br>
&gt; &gt; available in 6.0?<br>
&gt; &gt;<br>
&gt; &gt; -G<br>
&gt; &gt;<br>
&gt; &gt; =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F<br>
&gt; &gt; Veritas-bu maillist &nbsp;- &nbsp;Veritas-bu AT mailman DOT 
eng.auburn.e=
du<br>
&gt; &gt; http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Cybercity Webhosting (http://www.cybercity.dk)<br>
&gt;<br>
&gt; =5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=
=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F=5F<br>
&gt; Veritas-bu maillist &nbsp;- &nbsp;Veritas-bu AT mailman.eng.auburn DOT 
edu<br>
&gt; http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu<br>
&gt;<br>
<br>
</tt></font>
<br>
--=_alternative 001BC04B852570AD_=--
--=_mixed 001BC04B852570AD_=
Content-Type: application/octet-stream; name="autofroz"
Content-Disposition: attachment; filename="autofroz"
Content-Transfer-Encoding: base64

IyEvYmluL2tzaA0KDQojVGhyZXNoaG9sZCBhYm92ZSB3aGljaCBhIHRhcGUgcmVtYWlucyBmcm96
ZW4NClRIT0xEPTENCg0KI0FkZHJlc3MgZm9yIHJlcG9ydHMNCk1BSUxBRERSPVlPVUBZT1VSRE9N
QUlOLkNPTQ0KDQojVHJhY2tpbmcgZmlsZSAtIHdoZXJlICJzZWNvbmQgY2hhbmNlIiB0YXBlcyBh
cmUgdHJhY2tlZA0KVFJLPS91c3Ivb3BlbnYvdmFyL2BiYXNlbmFtZSAkMGAudHJrZmlsZQ0KDQoj
TG9nZmlsZQ0KTE9HPS91c3Ivb3BlbnYvbmV0YmFja3VwL2xvZ3Mvc2NyaXB0cy9gYmFzZW5hbWUg
JDBgLmxvZw0KDQpQQVRIPSRQQVRIOi91c3Ivb3BlbnYvbmV0YmFja3VwL2Jpbi9hZG1pbmNtZDov
dXNyL29wZW52L3ZvbG1nci9iaW46L3Vzci9vcGVudi9sb2NhbC9iaW4NCmV4cG9ydCBQQVRIDQoN
ClsgISAtZiAkVFJLIF0gJiYgZWNobyAiI1RoaXMgaXMgYSB0cmFja2luZyBmaWxlIGZvciBzY3Jp
cHQgXCIkMFwiLiIgPiRUUksNCg0KZWNobyAiIyBTY3JpcHQgXCJgYmFzZW5hbWUgJDBgXCIgc3Rh
cnQ6IGBkYXRlYCIgPiRMT0cNCmV4ZWMgMT4+JExPRyAyPiYxDQoNCiNGb3IgdGFwZSBpbiBsaXN0
IG9mIGZyb3plbiB0YXBlcw0KZm9yIG1lZGlhc3ZyIGluIGBpZGVudF9tZWRpYV9zZXJ2ZXJzYA0K
ZG8NCiAgZWNobyAiIyBTZWFyY2hpbmcgbWVkaWEgc2VydmVyIGZvciBmcm96ZW4gdGFwZXM6ICRt
ZWRpYXN2ciIgDQogIGZvciB0YXBlIGluIGBicG1lZGlhbGlzdCAtbWxpc3QgLWwgLWggJG1lZGlh
c3ZyfGF3ayAne2lmKCQxNSUyKXtwcmludCAkMX19J2ANCiAgZG8NCiAgICB0cGM9YGF3ayAnQkVH
SU57c3VtPTB9IHtpZigkMT09IickdGFwZSciKXtzdW0rK319IEVORHtwcmludCBzdW19JyAkVFJL
YA0KICAgIGlmIFsgJHRwYyAtZ2UgJFRIT0xEIF0NCiAgICB0aGVuDQogICAgIGlmIFsgImB2bXF1
ZXJ5IC13IC1tICR0YXBlfGF3ayAnTlI+MyAmJiAkMTEhPSJGcm96ZW4iIHtwcmludCAkOX0nYCIg
PSAiLSIgXQ0KICAgICB0aGVuDQogICAgICAgI0lmIG91dCBvZiBsaWJyYXJ5IGFuZCBub3QgYWxy
ZWFkeSBpbiB0aGUgIkZyb3plbiIgdm9sIGdyb3VwDQogICAgICAgdm1jaGFuZ2UgLW5ld192IEZy
b3plbiAtbSAkdGFwZQ0KICAgICAgIGVjaG8gIkZhaWx1cmUgdGhyZXNob2xkIGV4Y2VlZGVkIGZv
ciB0YXBlIFwiJHRhcGVcIi4gQ2hhbmdlZCB0byBcIkZyb3plblwiIFZHLiINCiAgICAgZWxzZQ0K
ICAgICAgICNMb2cgaXQgZm9yIG5vdyBidXQgcmVtb3ZlIHRoaXMgbGF0ZXIgdG8gcHJldmVudCBq
dW5raWUgcmVwb3J0DQogICAgICAgZWNobyAiRmFpbHVyZSB0aHJlc2hvbGQgZXhjZWVkZWQgZm9y
IHRhcGUgXCIkdGFwZVwiLiINCiAgICAgZmkNCiAgICBlbHNlDQogICAgICBlY2hvICIkdGFwZSBg
ZGF0ZSAnKyVtLyVkLyVZJ2AiID4+JFRSSw0KICAgICAgYnBtZWRpYSAtdW5mcmVlemUgLWV2ICR0
YXBlIC1oICRtZWRpYXN2cg0KICAgICAgZWNobyAiRnJvemVuIHRhcGUgXCIkdGFwZVwiIGdpdmVu
IGFub3RoZXIgY2hhbmNlLiINCiAgICBmaQ0KICBkb25lDQpkb25lDQplY2hvICIjIFNjcmlwdCBc
ImBiYXNlbmFtZSAkMGBcIiBmaW5pc2hlZDogYGRhdGVgIiANCmlmIFsgYGdyZXAgLWN2ICJeICoj
IiAkTE9HYCAtZ3QgMCBdDQp0aGVuDQogIGNhdCAkTE9HIHwgbWFpbHggLXMgIk5CIFJwdDogdGFw
ZXMgbWFuYWdlZCBieSBgYmFzZW5hbWUgJDBgIiAkTUFJTEFERFINCmZpDQojWyAtZiAkTE9HIF0g
JiYgcm0gJExPRw0KZXhpdA0K

--=_mixed 001BC04B852570AD_=--

<Prev in Thread] Current Thread [Next in Thread>
  • [Veritas-bu] How to prevent NBU from immediately using a medi a that failed before, Ray.Hill AT ny.frb DOT org <=