Bacula-users

Re: [Bacula-users] The number of files mismatch! Marking volume in Error in Catalog

2009-06-05 03:03:25
Subject: Re: [Bacula-users] The number of files mismatch! Marking volume in Error in Catalog
From: user100 <user100 AT lisec-sw DOT com>
To: Uwe Schuerkamp <hoover AT nionex DOT net>, Christian Gaul <christian.gaul AT otop DOT de>, Bob Hetzel <beh AT case DOT edu>, bacula-users AT lists.sourceforge DOT net
Date: Fri, 05 Jun 2009 09:00:46 +0200
  On 04.06.2009 10:16, Uwe Schuerkamp wrote:
> On Thu, Jun 04, 2009 at 09:20:34AM +0200, Christian Gaul wrote:
>> Bob Hetzel schrieb:
>>> Greetings,
>>>
>>> I've been seeing an issue whereby a volume gets marked in error
>>> periodically.  The last items logged about that volume are typically like 
>>> this:
>>>
>>> 02-Jun 11:53 gyrus-sd JobId 83311: Volume "LTO224L2" previously written,
>>> moving to end of data.
>>> 02-Jun 11:53 gyrus-sd JobId 83311: Error: Bacula cannot write on tape
>>> Volume "LTO224L2" because:
>>> The number of files mismatch! Volume=46 Catalog=45
>>> 02-Jun 11:53 gyrus-sd JobId 83311: Marking Volume "LTO224L2" in Error in
>>> Catalog.
>>>
>>> I don't think I have any SCSI errors, but instead the problem seems to be
>>> related to bacula not properly keeping track of the volume files in some
>>> rare case.
>>>
>>> This time the problem happened not too long after the volume got recycled
>>> and so I noted one thing about how the tape was used... a backup started on
>>> another volume and then spanned onto it.  Could that be a source of these
>>> problems?
>>>
>>> Here's the pertinent part of the bacula log file--debugging not turned on
>>> right now but I'm hoping enough got logged to help.  If not I'll have to
>>> turn debugging back on but what level would be good for determining the
>>> source of that error?
>>>
>>> http://casemed.case.edu/admin_computing/bacula/bacula-2009-06-01.log.txt
>>>
>>>      Bob
>>>
>> To me this looks like an issue reported a couple of times on this list,
>> once by me and once by another user, whereby Bacula isnt updating the
>> Volume Files when doing concurrent jobs.
>>
>> So far nobody has seemed interested in it. For me and another user it
>> has "worked" to set the maximum concurrent jobs to 1 on the device..
>> Yes, you will have jobs piling on for hours until they get worked off.
>>
>> I witnessed this first after upgrading from 2.4.4 to 3.0.0 but have not
>> been able to track it down myself or i would have made a proper
>> bugreport for it..
>>
>> Hope that helps a little
> Hi,
>
> we're running bacula 2.2.8, using concurrent jobs = 2 on a disk based
> set of volumes. I've done several restores from those volumes without
> any errors, and haven't seen the error you mention in a good 3 months
> or so since having switched from concurrent jobs = 1 to " = 2", so I'd
> consider this a "positive" report that the feature actually does
> work. The problem bug may have been introduced in a later version of
> bacula.
>
> All the best,
>
> Uwe
>
>
Concurrent jobs worked well on 2.2.8 and previous versions on our 
backup-machine for years too. After the upgrade to 3.0.1 the files 
mismatch.
I have tried on CentOS and with different settings on storage-daemon 
setup a new backup-server on Debian for testing, made a firmware upgrade 
on the autoloader, changed the database, changed the tapes... run btape 
test (again) does not help so far. With max concurrent jobs=1 it works. 
So currently it seems for me the least common denominator for this 
failure is (the upgrade to) Bacula 3.0.1.

Additional I had a failure with concurrent jobs and file-based backups. 
But it seems that is solved now. It was the same story as on tape. When 
two jobs was started the second one failed after a few seconds (with a 
little bit different error-message as on tape) each time. After 
recompiling on a new setup server on Debian I did not get this failure 
anymore. So I recompiled with the same configure settings on CentOS too 
and file-based concurrent backups seems to work on CentOS too now. I 
tried with the old compilation folder (/usr/src/bacula-...) of 3.0.1 
which installation did not work on CentOS - however I did not get the 
failure on concurrent file-based backups with that compilation too! I 
don´t know exactly what changed in meanwhile except the default-paths 
with the new make (with less configure options). I found out "make 
uninstall" did not remove all files from the system so I guess that 
there was an old file hanging around in some preferred earlier PATH from 
some other bacula version that caused troubles and was overwritten with 
the later compilation and installation (with other pathnames) maybe from 
1.3.8 or 2.0.x or 2.2.8. It was perfectly reproduceable with filebased 
backup before. However tape based backup still don´t work with 3.0.1 and 
concurrent jobs (even on a new setup server without previous installed 
bacula).


Greetings,
user100

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>