Bacula-users

Re: [Bacula-users] Bacula catalog size limits

2009-03-02 17:16:42
Subject: Re: [Bacula-users] Bacula catalog size limits
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 02 Mar 2009 23:06:45 +0100
Hi,

02.03.2009 13:17, Tim Bell wrote:
> What are the experiences of Bacula's scalability limits as the number of 
> files per server increase ? We are looking at backing up 1000+ clients 
> with millions of files in total.

Definitely a very interesting project :-)

>  I would like to understand if this is 
> feasible and how many servers we would need:

I'm pretty sure it's possible as I know there are installations of 
that size. As a very rough estimate, I'd suggest to plan with the 
following servers:
- one really high-end database server cluster of at least two machines
- one moderately equipped DIR server.
- A number of SD machines handling actual data storage. If you back up 
to tape, don't try to connect more tape drives to one machine that can 
be saturated... LTO-4 is so fast you can expect to run into 
bottle-necks at the SCSI/SAS/FC bus, internal buses, CPU throughput, 
and disk system even with a small number of drives. Disk storage 
systems are not that critical here (as they don't suffer from 
shoeshining), but obviously limited by the same factors.

With multi-linked 4G FC interconnects you can do a lot :-)

> 
> Specifically,
> 
> - What are the recommended largest number of files in the catalog for 
> each bacula instance ?

With version 3 (which will be released in March or April, hopefully), 
the catalog will get bigger fields for the IDs of some critical data. 
The number of files you can keep in one catalog instance will probably 
be sufficient then.

> - What database choice is the best for large numbers of files in the 
> catalog ?

MySQL or PostgreSQL - I'd choose whatever you're more comfortable 
with. I believe PostgreSQL performs a bit better, but with a catalog 
of the size you can expect to end up with, you definitely want someone 
able to handle a big database. So, if you've got good MySQL DBAs, 
choose that, even if performance would suggest to use PostgreSQL.

You'll definitely need a good database server... either integrated to 
the Bacula main server, or a separate machine. For a project of your 
size, I would suggest to evaluate the relative speeds of a database on 
the Bacula server and a dedicated database server connected by 10GE or 
some high-speed low-latency interconnect.

> - Do multiple instances of bacula on a single server make sense to 
> improve scalability ?

No... scalability is better reached by having several separate SD 
machines and a separate database machine. You should be fine with one 
DIR. Several SDs, preferably one per network segment, allow you to run 
faster data transfers to the final storage. Separate DIR, SD and 
catalog machines, furthermore, improve the reliability a bit because 
if one machine fails you won't have to go through the complete 
procedure of a desaster recovery... running the catalog database on a 
cluster of at least two machines should make it highly unlikely you 
ever have to recover Bacula's catalog from tape or disk volumes.

> Tim Bell
> CERN

I guess I want to visit you next time I'm in Switzerland... Bacula 
Systems' office is in Yverdon, not very far from CERN :-)

Arno

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users