Bacula-users

Re: [Bacula-users] [Bacula-devel] Despooling attrs does not finish

2014-10-21 13:18:04
Subject: Re: [Bacula-users] [Bacula-devel] Despooling attrs does not finish
From: Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at>
To: Blake Dunlap <ikiris AT gmail DOT com>
Date: Tue, 21 Oct 2014 19:15:23 +0200
Hi Blake,

On Tue, 2014-10-21 at 10:10 -0500, Blake Dunlap wrote:
> This sounds like a bug in bacula actually. It shouldn't follow
> recursion into the same structure, simply store the link and move on.
> 

i don't think its a bacula bug, because the directory hierarchy existed
in reality. also cecked it on the client itself.

i think the hierarchy was created by some unzip program,  imagine a
zip archive created on unix containing a symlink pointing to "..".
if the unzip follows the symlink (by intention or by a bug) then it
may ran into a recursion and create deeper and deeper directory
hierarchy. that's one way i can think of such a deep recursive path
may by created.

actually it was not easy to remove the tree manually on the client :)

Best regards
Ulrich


> -Blake
> 
> On Tue, Oct 21, 2014 at 4:17 AM, Ulrich Leodolter
> <ulrich.leodolter AT obvsg DOT at> wrote:
> > Hi all,
> >
> > i found the root cause of the problem, it was simply a mysql performance
> > problem because auf special filesystem hierarchy on the users desktop.
> >
> > there was one directory which was recursively repeated inside itself.
> >
> > C:/Users/name/Desktop/Exercise Files/CSS Core Concepts
> > C:/Users/name/Desktop/Exercise Files/CSS Core Concepts/Exercise Files/CSS 
> > Core Concepts/
> > ...
> >
> > there was only one file inside "CSS Core Concepts" and six empty sub 
> > directories
> > Chapter_01 to Chapter_06.  this hierarchy was repeated up to path length of 
> > 4834.
> > very strange, maybe a zip file containing symlinks pointing to .  was 
> > unzipped on desktop.
> >
> > the join on path in the batch insert seems to perform very badly comparing 
> > about 27000
> > long path names like that.
> >
> > INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, 
> > DeltaSeq)
> >   SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId, 
> > batch.LStat, batch.MD5, batch.DeltaSeq FROM batch
> >    JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = 
> > Filename.Name)
> >
> >
> > now we removed the almost empty tree "C:/Users/name/Desktop/Exercise Files"
> > and i am sure tomorrow the backup will finish in time without problems.
> >
> > Best regards
> > Ulrich
> >
> >
> >
> >
> > On Mon, 2014-10-20 at 15:21 +0200, alejandro alfonso fernandez wrote:
> >> Hi!
> >>
> >> I agree with Martin. It's no a Bacula error, it's a MySQL problem
> >>
> >> I'm pretty sure that your /tmp partition becomes full, specially if you
> >> share both Bacula Spool Directory (are you using "SpoolData = yes"?) and
> >> MySQL temporary filesystem (both of them in /tmp by default)
> >>
> >> Try changing the "tmpdir" param of your MySQL server (my.cnf) to a bigger
> >> partition (don't forget restart the service to commit the change)
> >>
> >> Example:
> >> # point the following paths to different dedicated disks
> >> # tmpdir                                                = /tmp/
> >> tmpdir                                          = /var/tmp/mysql
> >>
> >> Doing a "mysqlrepair" to test database integrity will be a good idea
> >>
> >> Best regards!
> >>
> >> On Mon, Oct 20, 2014 at 12:57 PM, Martin Simmons <martin AT lispworks DOT 
> >> com>
> >> wrote:
> >>
> >> > >>>>> On Sun, 19 Oct 2014 19:02:57 +0200, Ulrich Leodolter said:
> >> > >
> >> > > Hello Dan,
> >> > >
> >> > > On Sat, 2014-10-18 at 13:32 -0400, Dan Langille wrote:
> >> > > > On Oct 18, 2014,
> >> > > > at 4:03 AM, Ulrich Leodolter <ulrich.leodolter AT obvsg DOT at> 
> >> > > > wrote:
> >> > > >
> >> > > > > Hello,
> >> > > > >
> >> > > > > we have Win7 backup which does not come to an end within
> >> > MaxRunTime=12h.
> >> > > > > server runs 7.0.5 (28 July 2014),  the client has installed the
> >> > > > > bacula-enterprise-win64-7.0.5.exe.  but the problem started about 2
> >> > > > > months ago,  at that time windows client 5.2.10 was installed on 
> >> > > > > the
> >> > > > > machine.
> >> > > > >
> >> > > > > the backup itself is about 100GB compressed and seems to finish
> >> > > > > on the client after about 6 hours, below are the last messages of
> >> > > > > the job before it gets stuck.
> >> > > > >
> >> > > > > 2014-10-18 03:18:09 troll-sd JobId 635821: Committing spooled data 
> >> > > > > to
> >> > > > > Volume "Backup-0779". Despooling 1,692,736,419 bytes ...
> >> > > > > 2014-10-18 03:18:18 troll-sd JobId 635821: Despooling elapsed time 
> >> > > > > =
> >> > 00:00:09, Transfer rate = 188.0 M Bytes/second
> >> > > > > 2014-10-18 03:18:19 troll-sd JobId 635821: Elapsed time=06:11:45,
> >> > Transfer rate=4.691 M Bytes/second
> >> > > > > 2014-10-18 03:18:22 troll-sd JobId 635821: Sending spooled attrs to
> >> > the Director. Despooling 603,449,667 bytes .
> >> > > > >
> >> > > > > mysql status at the same time:
> >> > > > >
> >> > > > > # echo "show full processlist" | mysql
> >> > > > > Id        User    Host    db      Command Time    State   Info
> >> > > > > 6854      bacula  localhost       bacula  Sleep   522
> >> >  NULL
> >> > > > > 6873      bacula  localhost       bacula  Query   21143   Sending
> >> > data    INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, 
> >> > MD5,
> >> > DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId,
> >> > Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch 
> >> > JOIN
> >> > Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
> >> > Filename.Name)
> >> > > > > 6899      root    localhost       NULL    Query   0       NULL
> >> > show full processlist
> >> > > > >
> >> > > > >
> >> > > > > we have a bunch of other clients (about 30), a mixture of linux,
> >> > win7 and mac powerpc.
> >> > > > > all other backups run without problems for years now.  there are
> >> > even larger backups,
> >> > > > > in size and in number of files.
> >> > > > >
> >> > > > >
> >> > > > > does anyone have an idea why this single batch insert does not
> >> > complete?
> >> > > > >
> >> > > > > do i need to analyze the attrs spool file itself ?
> >> > > > >
> >> > > > > yesterday i optimized the bacula database, but that doesn't help.
> >> > > > > there must be something special in the attrs spool file which the
> >> > > > > mysql server can't handle. the server runs on standard CentOS 6.5
> >> > x86_64.
> >> > > >
> >> > > > This is something which should be asked in the user mailing list, not
> >> > the devel mailing list.  I am replying to that list instead.
> >> > > >
> >> > >
> >> > > Ok
> >> > >
> >> > > > Is this a large number of files?
> >> > > >
> >> > >
> >> > > about 750000,  not very large.
> >> > >
> >> > > > I had something which took a while.  I sped it up by giving 
> >> > > > PostgreSQL
> >> > more memory.  Perhaps MySQL can do the same.
> >> > > >
> >> > > > Here’s what I did:
> >> > https://plus.google.com/+DanLangille/posts/AKXoRido3U1
> >> > > >
> >> > >
> >> > > the mysql database is already optimized and has enough memory.
> >> > > backups up 4000000 files and 700GB run without problems.
> >> > >
> >> > > below are last job messages of the last failed jobs, it was canceled
> >> > > after 12 hours max run time.
> >> > >
> >> > > 2014-10-19 02:49:15 troll-sd JobId 635915: Elapsed time=05:43:32,
> >> > Transfer rate=5.095 M Bytes/second
> >> > > 2014-10-19 02:49:16 troll-sd JobId 635915: Sending spooled attrs to the
> >> > Director. Despooling 603,418,522 bytes ...
> >> > > 2014-10-19 09:05:43 troll-dir JobId 635915: Fatal error: Max run time
> >> > exceeded. Job canceled.
> >> > > 2014-10-19 09:41:17 troll-dir JobId 635915: Error: Bacula troll-dir
> >> > 7.0.5 (28Jul14):
> >> > >
> >> > > the database shows already about 600k files for this jobs.
> >> > >
> >> > > mysql> select count(*) from File where JobId = '635915' and LStat is 
> >> > > not
> >> > > null;
> >> > > +----------+
> >> > > | count(*) |
> >> > > +----------+
> >> > > |   616848 |
> >> > > +----------+
> >> > > 1 row in set (0.00 sec)
> >> > >
> >> > >
> >> > > is it possible that some special file attr brings the mysql datebase
> >> > > into troubles ?  i really cant imagine.  the mysql database is almost
> >> > > idle after first 600k have been inserted.  there is no io traffic and
> >> > > cpu usage is also low.
> >> > >
> >> > > it seems i have to dump spooled attrs files tomorrow and compare to
> >> > > database to see at which attrs have been inserted and at which one it
> >> > > stops.
> >> >
> >> > I think that will be difficult, because the insert is a join of various
> >> > other
> >> > temporary tables so the order may be random.
> >> >
> >> >
> >> > > has anyone a better idea how to debug this problem?
> >> > > i am little bit lost :) because in the last 6 years since
> >> > > i am using bacula i never run into a problem like this.
> >> >
> >> > Maybe you can attach strace or gdb to the mysql process running this 
> >> > insert
> >> > statement to see what it is doing?  It doesn't look like a Bacula 
> >> > problem,
> >> > but
> >> > it might be a bug in mysql.
> >> >
> >> > __Martin
> >> >
> >> >
> >> > ------------------------------------------------------------------------------
> >> > Comprehensive Server Monitoring with Site24x7.
> >> > Monitor 10 servers for $9/Month.
> >> > Get alerted through email, SMS, voice calls or mobile push notifications.
> >> > Take corrective actions from your mobile device.
> >> > http://p.sf.net/sfu/Zoho
> >> > _______________________________________________
> >> > Bacula-users mailing list
> >> > Bacula-users AT lists.sourceforge DOT net
> >> > https://lists.sourceforge.net/lists/listinfo/bacula-users
> >> >
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Comprehensive Server Monitoring with Site24x7.
> > Monitor 10 servers for $9/Month.
> > Get alerted through email, SMS, voice calls or mobile push notifications.
> > Take corrective actions from your mobile device.
> > http://p.sf.net/sfu/Zoho
> > _______________________________________________
> > Bacula-users mailing list
> > Bacula-users AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-users
> 



------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users