[Bacula-users] bacula 5.2.12 randomly fails to recycle disk based volume

Hello all,

even with bacula version 5.2.12 we're still seeing the errors below
where bacula sometimes fails to properly recycle a disk based volume. 

There are lots of error messages like these: 

######################################################################
10-Jan 21:16 serverl186-dir JobId 51100: Start Backup JobId 51100, 
Job=serverw2107.2013-01-10_16.39.52_51
10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume 
"incremental-0509"
10-Jan 21:16 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
purged from catalog.
10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume 
"incremental-0509"
10-Jan 21:16 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
purged from catalog.
10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume 
"incremental-0509"
 ....
<SNIP SNIP>
....

10-Jan 22:19 serverw2107-fd JobId 51100: DIR and FD clocks differ by -30 
seconds, FD automatically compensating.
10-Jan 22:19 serverl186-dir JobId 51100: Purging oldest volume 
"incremental-0509"
10-Jan 22:19 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
purged from catalog.
10-Jan 22:19 serverl186-dir JobId 51100: Purging oldest volume 
"incremental-0509"
10-Jan 22:19 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
purged from catalog.
10-Jan 22:19 serverl186-sd JobId 51100: Job serverw2107.2013-01-10_16.39.52_51 
is waiting. Cannot find any appendable volumes.
Please use the "label" command to create a new Volume for:
    Storage:      "FileStorage_incremental" (/mnt/msa/online_backup)
    Pool:         Online_incremental
    Media type:   File
10-Jan 22:24 serverl186-sd JobId 51100: Volume "incremental-0508" previously 
written, moving to end of data.
10-Jan 22:24 serverl186-sd JobId 51100: Ready to append to end of Volume 
"incremental-0508" size=1202
10-Jan 22:24 serverw2107-fd JobId 51100: Generate VSS snapshots. Driver="VSS 
Vista", Drive(s)="C"
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Task 
Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "VSS 
Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): 
"Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "System 
Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "ASR 
Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Shadow 
Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "COM+ 
REGDB Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Registry 
Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "IIS 
Config Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "IIS 
Metabase Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "BITS 
Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "WMI 
Writer", State: 0x1 (VSS_WS_STABLE)
10-Jan 22:25 serverl186-sd JobId 51100: Elapsed time=00:00:24, Transfer rate=21 
 Bytes/second
10-Jan 22:25 serverl186-dir JobId 51100: Fatal error: Catalog error creating 
JobMedia record. sql_create.c:155 Update Media record UPDATE Media SET 
EndFile=50, EndBlock=1638829081 WHERE MediaId=58 failed: ERR=
10-Jan 22:25 serverl186-sd JobId 51100: Fatal error: Error creating JobMedia 
record: 1992 Create JobMedia error

10-Jan 22:25 serverl186-sd JobId 51100: Fatal error: acquire.c:516 Could not 
create JobMedia record for Volume="incremental-0508" 
Job=serverw2107.2013-01-10_16.39.52_51
10-Jan 22:25 serverl186-dir JobId 51100: Error: Unable to get Media record for 
Volume incremental-0058: ERR=sql_get.c:1094 Media record for Volume 
"incremental-0058" not found.
######################################################################

At this time in mysql, I can still see a long running query initiated
by the volume recycle process: 

######################################################################
# mystat
Id      User    Host    db      Command Time    State   Info
Progress
16715   bacula  localhost       bacula  Query   18534   updating
DELETE FROM File WHERE JobId IN
(48428,48430,48432,48433,48434,48566,48568,48572,48574,48576,48578,4
0.000
16774   root    localhost       NULL    Query   0       NULL    show
processlist     0.000

######################################################################

As you can see above, the query to delete the files for the job
records found on the volume has been running for over 18,000 seconds. 

We don't generally seem to have a problem with long recycle times, but
every once in a while bacula hangs in the circumstances described
above. 

The error seems to occur roughly one hour after the recycling starts,
so could this be some sort of maximum wait interval expiring or
something? 

Any ideas how to fix this situation would be greatly appreciated. 

We're running on Centos 6 with bacula community 5.2.12 compiled from
source. We've also been seeing this behaviour with version 5.2.6. 

All the best, Uwe 


-- 
NIONEX --- Ein Unternehmen der Bertelsmann SE & Co. KGaA



------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula 5.2.12 randomly fails to recycle disk based volumes