Bacula-users

Re: [Bacula-users] [Bacula-devel] Despooling attrs does not finish

2014-10-20 09:24:29
Subject: Re: [Bacula-users] [Bacula-devel] Despooling attrs does not finish
From: alejandro alfonso fernandez <alejandro.alfonso.fernandez AT gmail DOT com>
To: Martin Simmons <martin AT lispworks DOT com>
Date: Mon, 20 Oct 2014 15:21:32 +0200
Hi!

I agree with Martin. It's no a Bacula error, it's a MySQL problem

I'm pretty sure that your /tmp partition becomes full, specially if you share both Bacula Spool Directory (are you using "SpoolData = yes"?) and MySQL temporary filesystem (both of them in /tmp by default)

Try changing the "tmpdir" param of your MySQL server (my.cnf) to a bigger partition (don't forget restart the service to commit the change)

Example:
# point the following paths to different dedicated disks
# tmpdir                                                = /tmp/
tmpdir                                          = /var/tmp/mysql

Doing a "mysqlrepair" to test database integrity will be a good idea

Best regards!

On Mon, Oct 20, 2014 at 12:57 PM, Martin Simmons <martin AT lispworks DOT com> wrote:
>>>>> On Sun, 19 Oct 2014 19:02:57 +0200, Ulrich Leodolter said:
>
> Hello Dan,
>
> On Sat, 2014-10-18 at 13:32 -0400, Dan Langille wrote:
> > On Oct 18, 2014,
> > at 4:03 AM, Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at> wrote:
> >
> > > Hello,
> > >
> > > we have Win7 backup which does not come to an end within MaxRunTime=12h.
> > > server runs 7.0.5 (28 July 2014),  the client has installed the
> > > bacula-enterprise-win64-7.0.5.exe.  but the problem started about 2
> > > months ago,  at that time windows client 5.2.10 was installed on the
> > > machine.
> > >
> > > the backup itself is about 100GB compressed and seems to finish
> > > on the client after about 6 hours, below are the last messages of
> > > the job before it gets stuck.
> > >
> > > 2014-10-18 03:18:09 troll-sd JobId 635821: Committing spooled data to
> > > Volume "Backup-0779". Despooling 1,692,736,419 bytes ...
> > > 2014-10-18 03:18:18 troll-sd JobId 635821: Despooling elapsed time = 00:00:09, Transfer rate = 188.0 M Bytes/second
> > > 2014-10-18 03:18:19 troll-sd JobId 635821: Elapsed time=06:11:45, Transfer rate=4.691 M Bytes/second
> > > 2014-10-18 03:18:22 troll-sd JobId 635821: Sending spooled attrs to the Director. Despooling 603,449,667 bytes .
> > >
> > > mysql status at the same time:
> > >
> > > # echo "show full processlist" | mysql
> > > Id        User    Host    db      Command Time    State   Info
> > > 6854      bacula  localhost       bacula  Sleep   522             NULL
> > > 6873      bacula  localhost       bacula  Query   21143   Sending data    INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name)
> > > 6899      root    localhost       NULL    Query   0       NULL    show full processlist
> > >
> > >
> > > we have a bunch of other clients (about 30), a mixture of linux, win7 and mac powerpc.
> > > all other backups run without problems for years now.  there are even larger backups,
> > > in size and in number of files.
> > >
> > >
> > > does anyone have an idea why this single batch insert does not complete?
> > >
> > > do i need to analyze the attrs spool file itself ?
> > >
> > > yesterday i optimized the bacula database, but that doesn't help.
> > > there must be something special in the attrs spool file which the
> > > mysql server can't handle. the server runs on standard CentOS 6.5 x86_64.
> >
> > This is something which should be asked in the user mailing list, not the devel mailing list.  I am replying to that list instead.
> >
>
> Ok
>
> > Is this a large number of files?
> >
>
> about 750000,  not very large.
>
> > I had something which took a while.  I sped it up by giving PostgreSQL more memory.  Perhaps MySQL can do the same.
> >
> > Here’s what I did: https://plus.google.com/+DanLangille/posts/AKXoRido3U1
> >
>
> the mysql database is already optimized and has enough memory.
> backups up 4000000 files and 700GB run without problems.
>
> below are last job messages of the last failed jobs, it was canceled
> after 12 hours max run time.
>
> 2014-10-19 02:49:15 troll-sd JobId 635915: Elapsed time=05:43:32, Transfer rate=5.095 M Bytes/second
> 2014-10-19 02:49:16 troll-sd JobId 635915: Sending spooled attrs to the Director. Despooling 603,418,522 bytes ...
> 2014-10-19 09:05:43 troll-dir JobId 635915: Fatal error: Max run time exceeded. Job canceled.
> 2014-10-19 09:41:17 troll-dir JobId 635915: Error: Bacula troll-dir 7.0.5 (28Jul14):
>
> the database shows already about 600k files for this jobs.
>
> mysql> select count(*) from File where JobId = '635915' and LStat is not
> null;
> +----------+
> | count(*) |
> +----------+
> |   616848 |
> +----------+
> 1 row in set (0.00 sec)
>
>
> is it possible that some special file attr brings the mysql datebase
> into troubles ?  i really cant imagine.  the mysql database is almost
> idle after first 600k have been inserted.  there is no io traffic and
> cpu usage is also low.
>
> it seems i have to dump spooled attrs files tomorrow and compare to
> database to see at which attrs have been inserted and at which one it
> stops.

I think that will be difficult, because the insert is a join of various other
temporary tables so the order may be random.


> has anyone a better idea how to debug this problem?
> i am little bit lost :) because in the last 6 years since
> i am using bacula i never run into a problem like this.

Maybe you can attach strace or gdb to the mysql process running this insert
statement to see what it is doing?  It doesn't look like a Bacula problem, but
it might be a bug in mysql.

__Martin

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users