Bacula-users

Re: [Bacula-users] bacula 5.2.12 randomly fails to recycle disk based volumes

2013-01-11 09:55:17
Subject: Re: [Bacula-users] bacula 5.2.12 randomly fails to recycle disk based volumes
From: Martin Simmons <martin AT lispworks DOT com>
To: Uwe Schuerkamp <uwe.schuerkamp AT nionex DOT net>
Date: Fri, 11 Jan 2013 14:52:05 GMT
>>>>> On Fri, 11 Jan 2013 10:00:16 +0100, Uwe Schuerkamp said:
> 
> Hello all,
> 
> even with bacula version 5.2.12 we're still seeing the errors below
> where bacula sometimes fails to properly recycle a disk based volume. 
> 
> There are lots of error messages like these: 
> 
> ######################################################################
> 10-Jan 21:16 serverl186-dir JobId 51100: Start Backup JobId 51100, 
> Job=serverw2107.2013-01-10_16.39.52_51
> 10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume 
> "incremental-0509"
> 10-Jan 21:16 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
> purged from catalog.
> 10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume 
> "incremental-0509"
> 10-Jan 21:16 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
> purged from catalog.
> 10-Jan 21:16 serverl186-dir JobId 51100: Purging oldest volume 
> "incremental-0509"
>  ....
> <SNIP SNIP>
> ....
> 
> 10-Jan 22:19 serverw2107-fd JobId 51100: DIR and FD clocks differ by -30 
> seconds, FD automatically compensating.
> 10-Jan 22:19 serverl186-dir JobId 51100: Purging oldest volume 
> "incremental-0509"
> 10-Jan 22:19 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
> purged from catalog.
> 10-Jan 22:19 serverl186-dir JobId 51100: Purging oldest volume 
> "incremental-0509"
> 10-Jan 22:19 serverl186-dir JobId 51100: 0 File on Volume "incremental-0509" 
> purged from catalog.
> 10-Jan 22:19 serverl186-sd JobId 51100: Job 
> serverw2107.2013-01-10_16.39.52_51 is waiting. Cannot find any appendable 
> volumes.
> Please use the "label" command to create a new Volume for:
>     Storage:      "FileStorage_incremental" (/mnt/msa/online_backup)
>     Pool:         Online_incremental
>     Media type:   File
> 10-Jan 22:24 serverl186-sd JobId 51100: Volume "incremental-0508" previously 
> written, moving to end of data.
> 10-Jan 22:24 serverl186-sd JobId 51100: Ready to append to end of Volume 
> "incremental-0508" size=1202
> 10-Jan 22:24 serverw2107-fd JobId 51100: Generate VSS snapshots. Driver="VSS 
> Vista", Drive(s)="C"
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Task 
> Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "VSS 
> Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): 
> "Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "ASR 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "Shadow 
> Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "COM+ 
> REGDB Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): 
> "Registry Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "IIS 
> Config Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "IIS 
> Metabase Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "BITS 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:24 serverw2107-fd JobId 51100: VSS Writer (BackupComplete): "WMI 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 10-Jan 22:25 serverl186-sd JobId 51100: Elapsed time=00:00:24, Transfer 
> rate=21  Bytes/second
> 10-Jan 22:25 serverl186-dir JobId 51100: Fatal error: Catalog error creating 
> JobMedia record. sql_create.c:155 Update Media record UPDATE Media SET 
> EndFile=50, EndBlock=1638829081 WHERE MediaId=58 failed: ERR=

"ERR=" looks like it failed to report the real error.


> 10-Jan 22:25 serverl186-sd JobId 51100: Fatal error: Error creating JobMedia 
> record: 1992 Create JobMedia error
> 
> 10-Jan 22:25 serverl186-sd JobId 51100: Fatal error: acquire.c:516 Could not 
> create JobMedia record for Volume="incremental-0508" 
> Job=serverw2107.2013-01-10_16.39.52_51
> 10-Jan 22:25 serverl186-dir JobId 51100: Error: Unable to get Media record 
> for Volume incremental-0058: ERR=sql_get.c:1094 Media record for Volume 
> "incremental-0058" not found.

Do you know which volume is used?  There seems to be some confusion between
incremental-0058 and incremental-0508.

Is that a common feature of other failures like this?



> ######################################################################
> 
> At this time in mysql, I can still see a long running query initiated
> by the volume recycle process: 
> 
> ######################################################################
> # mystat
> Id    User    Host    db      Command Time    State   Info
> Progress
> 16715 bacula  localhost       bacula  Query   18534   updating
> DELETE FROM File WHERE JobId IN
> (48428,48430,48432,48433,48434,48566,48568,48572,48574,48576,48578,4
> 0.000
> 16774 root    localhost       NULL    Query   0       NULL    show
> processlist   0.000
> 
> ######################################################################
> 
> As you can see above, the query to delete the files for the job
> records found on the volume has been running for over 18,000 seconds. 
> 
> We don't generally seem to have a problem with long recycle times, but
> every once in a while bacula hangs in the circumstances described
> above. 
> 
> The error seems to occur roughly one hour after the recycling starts,
> so could this be some sort of maximum wait interval expiring or
> something? 
> 
> Any ideas how to fix this situation would be greatly appreciated. 

Can you log all queries in the mysql server to see if that records anthing
interesting?

__Martin

------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>