Following up on my own post, I had a little free time the other
day and decided to investigate whether this was feasible. Setting up the
necessary services on Amazon was trivial, including access control and block
storage. I tried s3fs first, and it worked, but it felt like there was
way too much i/o going on for that kind of data (which is pretty much what I
expected). Then I tried putting my bacula-sd on an EC2 node, writing to
files on EBS, and it worked great (spooling first to the “local”
drive on EC2). Throughput however was somewhat less than I was hoping
for, approx. 25% of what I get locally to spool and then to tape.
However, I found that there was NO performance penalty for running two jobs
concurrently. I didn’t try larger numbers, but my guess is you can
run a large number of concurrent jobs to get a pretty good effective
throughput, assuming you have lots of clients with similar data sizes.
Our problem is that 80% of our data is on one client, and it
would take 130 hours to do a full backup, and our backup window simply isn’t
that long. Then I thought I could break the FileSets into smaller pieces
and run multiple backup jobs in parallel (and I’m assuming that my client
is not the bottleneck). However, it wouldn’t run more than one job
on that client concurrently. Since I can run multiple clients
concurrently, I’m pretty sure my bacula-dir.conf and bacula-sd.conf
settings are correct, and my bacula-fd.conf specifies “Maximum Concurrent
Jobs = 20”… Any other reason why I couldn’t run say 5
parallel jobs with different filesets off the same client?
From: Peter Zenge [mailto:pzenge AT ilinc DOT com]
Sent: Tuesday, March 02, 2010 2:57 PM
To: bacula-users AT lists.sourceforge DOT net
Subject: [Bacula-users] Bacula to the Cloud
Hello,
2 year Bacula user but first-time poster. I’m currently dumping
about 1.6TB to LTO2 tapes every week and I’m looking to migrate to a new
storage medium.
The
obvious answer, I think, is a direct-attached disk array (which I would be able
to put in a remote gigabit-attached datacenter before too long). However,
I’m wondering if anyone is currently doing large (or what seem to me to
be large) backups to the cloud in some way? Assuming I have a gigabit
connection to the Internet from my datacenter, I’m wondering how feasible
it would be to either use something like Amazon S3 with s3fs (I’m
guessing way too much overhead to be efficient), or a bacula-SD implementation
on an EC2 node, using Elastic Block Store (EBS) as “local” disk,
and VPN (Amazon VPC) between my datacenter and the SD.
Substitute
your favorite cloud provider for Amazon above; I don’t use any right now
so not tied to any particular provider. It just seems like Amazon has all
the necessary pieces today.
To
do this, and keep customers comfortable with the idea of data in the cloud, we
would need to encrypt, so I’m also wondering if it would be possible for
the SD to encrypt the backup volume, rather than the FD encrypt the data before
sending it to SD (which is what we do now)? Easier to manage if we just
handled encryption in one place for all clients.
I
would love to hear what other people are either doing with Bacula and the
cloud, or why you have decided not to.
Pzenge
.at. ilinc .dot. com