Bacula-users

Re: [Bacula-users] [Bacula-devel] Despooling attrs does not finish

2014-10-20 06:59:36
Subject: Re: [Bacula-users] [Bacula-devel] Despooling attrs does not finish
From: Martin Simmons <martin AT lispworks DOT com>
To: Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at>
Date: Mon, 20 Oct 2014 11:57:14 +0100
>>>>> On Sun, 19 Oct 2014 19:02:57 +0200, Ulrich Leodolter said:
> 
> Hello Dan,
> 
> On Sat, 2014-10-18 at 13:32 -0400, Dan Langille wrote:
> > On Oct 18, 2014, 
> > at 4:03 AM, Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at> wrote:
> > 
> > > Hello,
> > > 
> > > we have Win7 backup which does not come to an end within MaxRunTime=12h.
> > > server runs 7.0.5 (28 July 2014),  the client has installed the
> > > bacula-enterprise-win64-7.0.5.exe.  but the problem started about 2
> > > months ago,  at that time windows client 5.2.10 was installed on the
> > > machine.
> > > 
> > > the backup itself is about 100GB compressed and seems to finish
> > > on the client after about 6 hours, below are the last messages of
> > > the job before it gets stuck.
> > > 
> > > 2014-10-18 03:18:09 troll-sd JobId 635821: Committing spooled data to
> > > Volume "Backup-0779". Despooling 1,692,736,419 bytes ...
> > > 2014-10-18 03:18:18 troll-sd JobId 635821: Despooling elapsed time = 
> > > 00:00:09, Transfer rate = 188.0 M Bytes/second
> > > 2014-10-18 03:18:19 troll-sd JobId 635821: Elapsed time=06:11:45, 
> > > Transfer rate=4.691 M Bytes/second
> > > 2014-10-18 03:18:22 troll-sd JobId 635821: Sending spooled attrs to the 
> > > Director. Despooling 603,449,667 bytes .
> > > 
> > > mysql status at the same time:
> > > 
> > > # echo "show full processlist" | mysql
> > > Id        User    Host    db      Command Time    State   Info
> > > 6854      bacula  localhost       bacula  Sleep   522             NULL
> > > 6873      bacula  localhost       bacula  Query   21143   Sending data    
> > > INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, 
> > > DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, 
> > > Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch 
> > > JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = 
> > > Filename.Name)
> > > 6899      root    localhost       NULL    Query   0       NULL    show 
> > > full processlist
> > > 
> > > 
> > > we have a bunch of other clients (about 30), a mixture of linux, win7 and 
> > > mac powerpc.
> > > all other backups run without problems for years now.  there are even 
> > > larger backups,
> > > in size and in number of files.
> > > 
> > > 
> > > does anyone have an idea why this single batch insert does not complete?
> > > 
> > > do i need to analyze the attrs spool file itself ?
> > > 
> > > yesterday i optimized the bacula database, but that doesn't help.
> > > there must be something special in the attrs spool file which the
> > > mysql server can't handle. the server runs on standard CentOS 6.5 x86_64.
> > 
> > This is something which should be asked in the user mailing list, not the 
> > devel mailing list.  I am replying to that list instead.
> > 
> 
> Ok
> 
> > Is this a large number of files?
> > 
> 
> about 750000,  not very large.
> 
> > I had something which took a while.  I sped it up by giving PostgreSQL more 
> > memory.  Perhaps MySQL can do the same.
> > 
> > Here’s what I did: https://plus.google.com/+DanLangille/posts/AKXoRido3U1
> > 
> 
> the mysql database is already optimized and has enough memory.
> backups up 4000000 files and 700GB run without problems.
> 
> below are last job messages of the last failed jobs, it was canceled
> after 12 hours max run time.
> 
> 2014-10-19 02:49:15 troll-sd JobId 635915: Elapsed time=05:43:32, Transfer 
> rate=5.095 M Bytes/second
> 2014-10-19 02:49:16 troll-sd JobId 635915: Sending spooled attrs to the 
> Director. Despooling 603,418,522 bytes ...
> 2014-10-19 09:05:43 troll-dir JobId 635915: Fatal error: Max run time 
> exceeded. Job canceled.
> 2014-10-19 09:41:17 troll-dir JobId 635915: Error: Bacula troll-dir 7.0.5 
> (28Jul14):
> 
> the database shows already about 600k files for this jobs.
> 
> mysql> select count(*) from File where JobId = '635915' and LStat is not
> null;
> +----------+
> | count(*) |
> +----------+
> |   616848 |
> +----------+
> 1 row in set (0.00 sec)
> 
> 
> is it possible that some special file attr brings the mysql datebase
> into troubles ?  i really cant imagine.  the mysql database is almost
> idle after first 600k have been inserted.  there is no io traffic and
> cpu usage is also low.
> 
> it seems i have to dump spooled attrs files tomorrow and compare to
> database to see at which attrs have been inserted and at which one it
> stops.

I think that will be difficult, because the insert is a join of various other
temporary tables so the order may be random.


> has anyone a better idea how to debug this problem?
> i am little bit lost :) because in the last 6 years since
> i am using bacula i never run into a problem like this.

Maybe you can attach strace or gdb to the mysql process running this insert
statement to see what it is doing?  It doesn't look like a Bacula problem, but
it might be a bug in mysql.

__Martin

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users