Hi,
02.03.2009 13:17, Tim Bell wrote:
> What are the experiences of Bacula's scalability limits as the number of
> files per server increase ? We are looking at backing up 1000+ clients
> with millions of files in total.
Definitely a very interesting project :-)
> I would like to understand if this is
> feasible and how many servers we would need:
I'm pretty sure it's possible as I know there are installations of
that size. As a very rough estimate, I'd suggest to plan with the
following servers:
- one really high-end database server cluster of at least two machines
- one moderately equipped DIR server.
- A number of SD machines handling actual data storage. If you back up
to tape, don't try to connect more tape drives to one machine that can
be saturated... LTO-4 is so fast you can expect to run into
bottle-necks at the SCSI/SAS/FC bus, internal buses, CPU throughput,
and disk system even with a small number of drives. Disk storage
systems are not that critical here (as they don't suffer from
shoeshining), but obviously limited by the same factors.
With multi-linked 4G FC interconnects you can do a lot :-)
>
> Specifically,
>
> - What are the recommended largest number of files in the catalog for
> each bacula instance ?
With version 3 (which will be released in March or April, hopefully),
the catalog will get bigger fields for the IDs of some critical data.
The number of files you can keep in one catalog instance will probably
be sufficient then.
> - What database choice is the best for large numbers of files in the
> catalog ?
MySQL or PostgreSQL - I'd choose whatever you're more comfortable
with. I believe PostgreSQL performs a bit better, but with a catalog
of the size you can expect to end up with, you definitely want someone
able to handle a big database. So, if you've got good MySQL DBAs,
choose that, even if performance would suggest to use PostgreSQL.
You'll definitely need a good database server... either integrated to
the Bacula main server, or a separate machine. For a project of your
size, I would suggest to evaluate the relative speeds of a database on
the Bacula server and a dedicated database server connected by 10GE or
some high-speed low-latency interconnect.
> - Do multiple instances of bacula on a single server make sense to
> improve scalability ?
No... scalability is better reached by having several separate SD
machines and a separate database machine. You should be fine with one
DIR. Several SDs, preferably one per network segment, allow you to run
faster data transfers to the final storage. Separate DIR, SD and
catalog machines, furthermore, improve the reliability a bit because
if one machine fails you won't have to go through the complete
procedure of a desaster recovery... running the catalog database on a
cluster of at least two machines should make it highly unlikely you
ever have to recover Bacula's catalog from tape or disk volumes.
> Tim Bell
> CERN
I guess I want to visit you next time I'm in Switzerland... Bacula
Systems' office is in Yverdon, not very far from CERN :-)
Arno
--
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|