Bacula-users

Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device"LTO4"

2011-07-10 22:16:41
Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device"LTO4"
From: Dan Langille <dan AT langille DOT org>
To: stevecs AT chaven DOT com
Date: Sun, 10 Jul 2011 22:13:32 -0400

On Jul 10, 2011, at 3:18 PM, Steve Costaras wrote:

 
-----Original Message-----
From: Dan Langille [mailto:dan AT langille DOT org]
Sent: Sunday, July 10, 2011 12:58 PM
To: stevecs AT chaven DOT com
Cc: bacula-users AT lists.sourceforge DOT net
Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"

>>
>> 2) since everything is spooled first, there should be NO error that should cancel a job. A tape drive could fail, a tape could burst into flame, all that would be needed was bacula to know that >>there was an issue and give the admin a simple statement do you want to fix the issue or cancel?, the admin to fix the problem, and then bacula told to restart from the last block that was >>stored successfully OR if need be from the beginning of the spooled data file.

>This I do know. Although, at first glance it seems easy to do this, it is not. If it was trivial to do, I assure you, it would already be in place.

>> Canceling jobs that run for days for TB's of data is just screwed up.

>I suggest running smaller jobs. I don't mean to sound trite, but that really is the solution. Given that the alternative is non-trivial, the sensible choice is, I'm afraid, cancel the job.

I'm already kicking off 20+ jobs for a single system already.   This does not work when we're talking over the 100TB/nearly 200TB mark.     And when these errors happen it does not matter how many jobs you have as /all/ outstanding jobs fail when you have concurancy (in this case all jobs that were qued and were not even writing to the same tape were canceled).  
This sounds like a configuration issue.  Queued jobs should not be cancelled when a previous job cancels.

This does not happen with any other enterprise backup software not that they should be 100% mimicked.
With the data sizes we have today I don't see why there are not better error handling checks/routines.

This is open source software.  Stuff gets written because someone wants it.  Clearly, nobody who wants it has written. That is why it does not exist.

-- 
Dan Langille - http://langille.org

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>