On Jul 10, 2011, at 3:18 PM, Steve Costaras wrote:
-----Original Message----- From: Dan Langille [mailto:dan AT langille DOT org] Sent: Sunday, July 10, 2011 12:58 PM To: stevecs AT chaven DOT com Cc: bacula-users AT lists.sourceforge DOT net Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block to device "LTO4"
>> >> 2) since everything is
spooled first, there should be NO error that should cancel a job. A
tape drive could fail, a tape could burst into flame, all that would be
needed was bacula to know that >>there was an issue and give the admin a
simple statement do you want to fix the issue or cancel?, the admin to
fix the problem, and then bacula told to restart from the last block
that was >>stored successfully OR if need be from the beginning of the
spooled data file.
>This I do know. Although, at first glance it
seems easy to do this, it is not. If it was trivial to do, I assure
you, it would already be in place.
>> Canceling jobs that run for days for TB's of data is just screwed up.
>I
suggest running smaller jobs. I don't mean to sound trite, but that
really is the solution. Given that the alternative is non-trivial, the
sensible choice is, I'm afraid, cancel the job.
I'm already kicking off 20+ jobs for a single system already. This does not work when we're talking over the 100TB/nearly 200TB mark. And when these errors happen it does not matter how many jobs you have as /all/ outstanding jobs fail when you have concurancy (in this case all jobs that were qued and were not even writing to the same tape were canceled).
This sounds like a configuration issue. Queued jobs should not be cancelled when a previous job cancels. This does not happen with any other enterprise backup software not that they should be 100% mimicked.
With the data sizes we have today I don't see why there are not better error handling checks/routines.
This is open source software. Stuff gets written because someone wants it. Clearly, nobody who wants it has written. That is why it does not exist.
|
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|