Veritas-bu

[Veritas-bu] How to prevent NBU from immediately using a medi a that failed before

2005-10-28 12:16:48
Subject: [Veritas-bu] How to prevent NBU from immediately using a medi a that failed before
From: Mark.Donaldson AT cexp DOT com (Mark.Donaldson AT cexp DOT com)
Date: Fri, 28 Oct 2005 10:16:48 -0600
Frozen, though, isn't necessarily mean broken.  A media fault is possible
but then there's the drive faults too, loader error, sunspots, plague.

I've got a script that sweeps the frozen tapes, keeps a count, and unfreezes
them if there hasn't been enough failures.  Any tape that freezes over 3
times stays frozen.  I may be a method you could adapt.

-M

-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of
ida3248b AT post.cybercity DOT dk
Sent: Friday, October 28, 2005 2:28 AM
To: Sto Rage©; Veritas NBU Mailing List (E-mail)
Subject: Re: [Veritas-bu] How to prevent NBU from immediately using a
media that failed before


Hi G

You can under INSTALLPATH/netbackup created the files

MEDIA_ERROR_THRESHOLD number of allowed errors

TIME_WINDOW in which number of errors occurs (number of hours)

If you put 0 the first file, the tape should get frozen at the first error

Regards
Michael

On Thu, 27 Oct 2005 11:11:11 -0700, Sto Rage© wrote
> Hi,
>   Here's my problem, a backup job writes to a media and then fails
> with write error/position error etc. The job then gets re-queued and
> runs again, then NBU uses this very same tape and writes and fails
> again, this happens till the max retires of the job is exceeded and
> then the job fails.
> Why does it reuse the same tape again and again for the same
> job/policy? Is there a counter that we can set to prevent NBU from
> retrying a media that errors out the first time?
> The logs below from bptm show the media ID 001956 being repeatedly used.
> 
> 02:01:58.703 [5842] <2> log_media_error: successfully wrote to error
> file - 10/27/05 02:01:58 001956 13 POSITION_ERROR
> 02:29:33.454 [21029] <2> log_media_error: successfully wrote to error
> file - 10/27/05 02:29:33 001956 13 POSITION_ERROR
> 03:19:20.128 [22766] <2> log_media_error: successfully wrote to error
> file - 10/27/05 03:19:20 001956 13 POSITION_ERROR
> 04:30:34.394 [25958] <2> log_media_error: successfully wrote to error
> file - 10/27/05 04:30:34 001956 13 POSITION_ERROR
> 
>   Ironically, the 5th time it successfully wrote to this tape and
> continued with the job.
> We run huge NDMP jobs (average size of each is 2 TB) so when this
> happens say 70% into a job, NBU has to start from the beginning, 
> sadly checkpoint restart is not an option for NDMP backups. Is this 
> available in 6.0?
> 
> -G
> 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


--
Cybercity Webhosting (http://www.cybercity.dk)

_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


<Prev in Thread] Current Thread [Next in Thread>