Bacula-users

Re: [Bacula-users] Storage is stuck at "Device is BLOCKED waiting to create a volume"

2017-04-04 10:10:28
Subject: Re: [Bacula-users] Storage is stuck at "Device is BLOCKED waiting to create a volume"
From: Kern Sibbald <kern AT sibbald DOT com>
To: Zdeněk Bělehrádek <zdenek.belehradek AT economia DOT cz>, bacula-users AT lists.sourceforge DOT net
Date: Tue, 4 Apr 2017 16:09:20 +0200
Hello,

Well, I am out of ideas.

Yes, Bacula has a bugs database, and you can report it, but at this 
point it appears unlikely that it is a bug otherwise someone else would 
have the same problem.  I will need to have a way to reproduce the 
problem. You can try turning on level 200 debug in the Director, and 
when the problem arises, do an llist on all volumes (note that is llist 
with double l).  Also provide your bacula-dir.conf and bacula-sd.conf. 
That may show some problem. The main point is for you to prove that 
there are other suitable Volumes that are available.  If doing those 
things does not uncover a problem, and I cannot reproduce it (currently 
the case), there will not be much more that I can do.

Best regards,

Kern


On 04/04/2017 01:48 PM, Zdeněk Bělehrádek wrote:
> Hi, thanks for your reply.
>
> Ad 1: they are the same, specifically 7.4.3+dfsg-1+sid1~bpo8+1 from
> jessie-backports (I just verified it). For this test, even the FDs were
> this version.
>
> Ad 2: I worked with clean catalog:
>   - stop director and storages
>   - psql: drop database bacula
>   - psql: create database bacula owner bacula
>   - PGPASSWORD=XXXXX db_name=bacula
> /usr/share/bacula-director/make_postgresql_tables -U bacula -h
> bacdb1.cent -d bacula
>   - start director and storages, enable trace
>
> To be sure, i checked PostgreSQL logs, and there is only one error,
> repeating every time bacula runs a job:
> Apr  3 20:00:01 bacdb1 postgres[10867]: [24-1] 2017-04-03 20:00:01 CEST
> [10867-43] bacula@bacula ERROR:  table "delcandidates" does not exist
> Apr  3 20:00:01 bacdb1 postgres[10867]: [24-2] 2017-04-03 20:00:01 CEST
> [10867-44] bacula@bacula STATEMENT:  DROP TABLE DelCandidates
>
> I don't know why bacula tries to delete nonexistent tables, but looking
> to the source code, this query is used only when pruning jobs to clean
> up temporary tables. I think it is harmless.
>
> I ran dbcheck against my catalog, and it found 2 orphaned clients (one
> is not accessible in testing env and not needed, one have it's job
> stuck) and 2 orphaned filesets (both have jobs that didn't run yet). So
> no errors there either.
>
> The server is OpenStack virtual server running on our infrastructure,
> there were no crashes nor any problems I know of.
>
> Is there any other way to check for catalog damage?
>
> Ad 3: I run the jobs manualy after setting up new catalog, it takes only
> few minutes. My retention periods are 7 days minimum.
>
> Ad 4: I do not edit the catalog manually. I was using bacula-web to
> display contents of the catalog, so to be sure I just re-run the test
> with clean catalog and bacula-web disabled and the bug is still here.
>
> Ad 5: I created it fresh by running make_postgresql_tables (from bacula
> package) in empty database.
>
> root@bacdir1:~# dpkg -S /usr/share/bacula-director/make_postgresql_tables
> bacula-director-pgsql: /usr/share/bacula-director/make_postgresql_tables
> root@bacdir1:~# dpkg -l bacula-director-pgsql | grep "^ii"
> ii  bacula-director-pgsql                     7.4.3+dfsg-1+sid1~bpo8+1
> amd64                     network backup service - PostgreSQL storage
> for Director
> [PREP]root@bacdir1:~# grep
> /usr/share/bacula-director/make_postgresql_tables -e Version
> INSERT INTO Version (VersionId) VALUES (15);
>
> Ad 6: there are 3 programs that could do it automatically: bacula
> director, bacula-web (I disabled it) and nagios check (we don't run
> nagios in test environment). I am quite sure nobody except bacula can do
> it. And yes, I am sure no of my co-workers could mess with catalog
> either, I did ask.
>
>
> Looking at the above, I am starting to think it may be a bug in Bacula.
> Should i report it? Where?
>
> With regards,
> Zdeněk Bělehrádek
>
> Dne 3.4.2017 v 18:51 Kern Sibbald napsal(a):
>> Hello,
>>
>> The error you are getting should never happen, which means that
>> something is seriously wrong with your Bacula installation.  A few of
>> the multiple possibilities are:
>>
>> 1. Your DIR and SDs are not on the same version.  They *must* all be the
>> same. With the little information you provided, for the moment this
>> seems to be the most likely problem.
>>
>> 2. Your catalog is damaged.
>>
>> 3. Your Retention periods are too short and records are being removed
>> from the catalog.
>>
>> 4. You have manually modified your catalog, so that now the records are
>> not consistent.
>>
>> 5. Your catalog does not correspond to the Bacula Director version you
>> are running.  This should be detected, but perhaps the catalog was later
>> manually modified.
>>
>> 6. Either manually or some program is removing Volume records from the
>> catalog or changing them (this point is probably a duplication of point 4)
>>
>> Best regards,
>>
>> Kern
>>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>