Bacula-users

Re: [Bacula-users] Configuration reload for bacula-sd

2014-10-28 11:52:31
Subject: Re: [Bacula-users] Configuration reload for bacula-sd
From: Josh Fisher <jfisher AT pvct DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 28 Oct 2014 11:48:19 -0400
On 10/28/2014 9:24 AM, Alan Brown wrote:
> On 28/10/14 12:46, Ana Emília M. Arruda wrote:
>
>
>> , maybe
>> a second device definition for a job or pool could be more helpful than
>> a bacula-sd.conf reload on-the-fly or the enable/disable commands.
> This does not work. I've tried it.
>
> If an autochanger tape drive fails, jobs pile up behind it.
>
> What's far worse than a drive failing is one getting dirty - they spend
> forever doing rewrites and througput drops from hundreds of Mb per
> second to 10-20, WITHOUT raising errors in the backup system - and
> running a cleaning tape doesn't work in a lot of cases (LTO drives are
> self cleaning, If you need a tape then you're already in trouble)
>
> This is far from ideal behaviour, especially when there are petabytes of
> science data involved.

Granted. But that sort of hardware failure must be handled at the device 
driver level. Bacula cannot be blamed for the device driver not raising 
an error, and the same behavior would be observed regardless of user 
mode software used.

> The only way out of either situation at the moment involves restarting
> the storage daemon, which kills ALL jobs running on ALL drives.

Have you tried the umount command in bconsole? umount will close the 
device and allow using mt or whatever tools to fix the problem. A 
subsequent mount command will re-open the device. If a new or different 
tape has been inserted, then the mount command will cause the volume 
label to be read. The current job will likely have to be canceled, 
unfortunately. Nevertheless, it is often possible to fix a drive issue 
without restarting bacula-sd. Ideally, there would be some way to 
declare all data written to the failing tape invalid and cause Bacula to 
restart the job from the point where data was first written to the 
failed tape, though I don't know if that is currently possible. And what 
about other jobs that have already successfully written to the now 
failed tape?

> Comment: Please don't presume to lecture me about what I should or
> should not be doing in my enterprise environment, or indeed about the
> way systems are setup (it's all fabric path for starters and bacula does
> not do d2d unless you count disk spooling - which we use intensively),
> you have no idea of the operational constraints on my site and you're
> making a bunch of fairly arrogant assumptions about the way things are
> run which impinge on the way you think Bacula should operate.
>
> It's this kind of attitude which results in inflexible software that
> gets sworn at, rather than sworn by.
>
>
> Thankfully Kern and his team are well aware that needs vary depending on
> setups and that multiple-tape drive setups need improvement.

Absolutely. It is still evolving.

> Tape and disk are different animals and need to be approached differently.
>
> Virtual autochangers are a kludge to allow for removable disks but in
> most configured installations they do _not_ treat those disks in the
> same way as real tape drives.
>

I don't entirely agree. For the most part, Bacula sticks to the Unix 
principle of "everything is a file". Standard C library file i/o is 
used. Once the file is opened, whether device file or filesystem file, 
it is treated in exactly the same way by Bacula. Any difference is due 
to the device and/or filesystem drivers and is beyond Bacula's control, 
as it should be. If there are problems with the device driver and/or 
device firmware not detecting error conditions, then a bug report is in 
order.

That said, there is room for improvement in how media errors, once 
detected, are handled. It would be nice to be able to restart jobs from 
the point at which data was first written to a particular tape, since by 
Murphy's Law, the failing tape tends to be the last tape needed.


------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users