Bacula-users

Re: [Bacula-users] Bacula catalog size limits

2009-03-02 15:22:59
Subject: Re: [Bacula-users] Bacula catalog size limits
From: "T. Horsnell" <tsh AT mrc-lmb.cam.ac DOT uk>
To: Tim Bell <tim.bell AT cern DOT ch>, bacula-users AT lists.sourceforge DOT net
Date: Mon, 02 Mar 2009 18:13:19 +0000
Tim Bell wrote:
What are the experiences of Bacula's scalability limits as the number of files per server increase ? We are looking at backing up 1000+ clients with millions of files in total. I would like to understand if this is feasible and how many servers we would need:

Specifically,

- What are the recommended largest number of files in the catalog for each bacula instance ? - What database choice is the best for large numbers of files in the catalog ? - Do multiple instances of bacula on a single server make sense to improve scalability ?

Tim Bell
CERN


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
Attached is an accumulated conversation I had on this subject some months ago.
We're currently backing up about 10Tbyte, containing about 39million files.
One SD, FD, and pool.

Catalog:
[root@ls3 ~]# du -sh /var/lib/mysql/bacula/
23G     /var/lib/mysql/bacula/


Cheers,
Terry.
--- Begin Message ---
Subject: Re: [Bacula-users] Bacula limits
From: Eric Bollengier <eric AT eb.homelinux DOT org>
To: Dan Langille <dan AT langille DOT org>
Date: Fri, 29 Aug 2008 16:48:55 +0200
Le Friday 29 August 2008 15:50:49 Dan Langille, vous avez écrit :
> Some additional thoughts on the patch:
>
> Dan Langille wrote:
> > ebollengier wrote:
> >> T. Horsnell wrote:
> >>> I will be backing up maybe 70million files in one job.
> >>> Am I approaching any Bacula/catalog limit on the number of files?
> >>> What is the maximum number of files which Bacula can handle,
> >>> and are there any other limits which we should know about?
> >>>
> >  > Hello,
> >  >
> >  > By default, the database is configured to handle 4 billion entries in
> >  > the file table, and
> >  > with 70million files per job, this limit will come quite fast.
> >
> > Do you have a reference for this 4 billion limit?
> >
> > I suspect you are referring to the data types used for the file.fileid
> > column.  In PostgreSQL, this is an integer value.
> >
> > 4 bytes
> > -2147483648 to +2147483647
>
> I think it's more like 2 billion files.

AUTOINCREMENT and serial fields can't loop, so after backuping 2 billion files
we can assume that it will fail somewhere.

> > With 8 bytes we can get (commas added by me)
> > -9223372036854775808 to 9,223,372,036,854,775,807
> >
> > See
> > http://www.postgresql.org/docs/8.3/interactive/datatype-numeric.html#DATA
> >TYPE-INT
> >
> > In MySQL, the limits are the same.
> >
> >  > I suggest you to apply the trunk/patches/testing/fileid64.patch patch
> >  > to your installation, and to upgrade the FileId field of your catalog
> >  > (see update_mysql_catalog shell).
> >
> > Looking at that patch, I think other tables need to change as well:
> >
> >    basefiles.fileid
>
> This assumes we have more than 2 billion unique file names (including
> path).

I'm not sure of what you mean, my patch allow to backup more than 2 billion 
files per catalog (over thousand/million jobs) with a limit of 2 billion 
different path and 2 billion different filename.

> It may be harder to reach 2 billion unique file names (without path).
> If we want do cater for that, we also have to change"
>
> filename.filenameid
> path.filenameid
> unsavedfiles.filenameid

Maybe we can change from signed to unsigned, and we will have 4 billion 
different filename and path per catalog...

It will work on postgresql, but i've to lookup for mysql. (sequences are on 8 
bytes)

> And possibly some data types within Bacula.
>
>
> Re: http://www.bacula.org/en/developers/Catalog_Services.html

Yes, i've talk about that with kern, and we are agree that it's very important 
to upgrade FileId soon as possible. FilenameId and PathId are much harder to 
overflow and can wait a bit.

Bye

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

--- End Message ---
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users