Bacula-users

Re: [Bacula-users] Speed and integration issues

2008-12-05 10:58:17
Subject: Re: [Bacula-users] Speed and integration issues
From: Bob Hetzel <beh AT case DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 05 Dec 2008 10:55:07 -0500

> Date: Fri, 5 Dec 2008 04:45:56 -0500
> From: David Lee Lambert <davidl AT lmert DOT com>

> I'm trying to use Bacula to do daily backups of data stored in iSCSI LUNs on 
> a  
> NetApp filer, using NetApp snapshots to ensure consistency.  The hosts to be 
> backed up have dual Gigabit Ethernet connections to the NetApp.  The backup 
> host consists of:

One thing you should make sure if is that your snapshot is read-only and 
you're not trying to update the access time of the files when you back 
them up.

> 
> - a desktop-class (32-bit, 2.4GHz) machine with a single local SATA drive

That has enough CPU power for bacula at LTO-2 speeds but the single 
(slow) SATA drive may be causing your bacula database to crawl.

> - an Overland Storage autochanger with room for 12 LTO-4 tapes

LTO-4 needs data streamed very fast.  When you are running a backup, if 
you look at the output from the "top" command what's using most of the 
CPU?  Is it bacula or the postgress database?  Keep in mind that to move 
large amounts of data rapidly across the bus of your backup server 
you're going to need CPU power.  Since your backup server needs to do 
more than just move the data being backed up you may need a faster 
backup server.

> - a built-in Fast Ethernet adapter (3com 3c509) and an add-in Gigabit 
> Ethernet 
> adapter (Linksys rev 10)
> - running Ubuntu G server and kernel 2.6.22; Bacula is storing its catalog in 
> a local Postgres database
> 
> One issue we've struggled with is speed.  With the GB adapter, reading files 
> from a snapshot via iSCSI, we were consistently getting less than 2MByte/sec, 
> sometimes as low as 300kbyte/sec.  Yesterday we switched to the 100Mbit 
> adapter,  and were sometimes able to almost max it out during a full backup 
> (network usage of 10 to 11 MByte/sec on the Fast Ethernet adapter),  but it 
> also slowed down sometimes: it took 25 minutes to back up a 22GB LUN with 7GB 
> of files,  and it took 25 minutes to back up a 6GB LUN with 1.1GB of files 
> (yes, almost exactly the same amount of total time).

The speed of the backup will depend on the size of the files so you will 
likely see widely varying speeds if you do a collection of different 
servers.  Web directories and mail trees (at least the ones where every 
message is a single file) are probably going to be the slowest.  They'll 
be slow due to the stuff needed on starting and stopping of reads for 
the Netapp as well as catalog and other operations on your backup 
server.  This is true for any backup system that does file backups 
rather than image backups (i.e. where you can easily restore 1 file as 
opposed to just a whole volume of files).

The 100 Mbit adapter showing improved performance suggests you have a 
bag gig card somewhere, a problem on your network, or if you're lucky 
just a net card driver or firmware that needs to be updated.

> I recently did dd to a raw tape and got a speed of at least 17MByte/sec.  The 
> local drive seems to have a write speed of about 7Mbyte/sec,  so pooling to 
> local disk is not an option.  On our faster servers with dual server-class 
> Gigabit Ethernet adapters,  I can get burst read speeds of 40 to 70 
> Mbyte/sec.

We had a problem whereby some low end desktop class gigabit switches (8 
or 10 ports I can't remember which) would perform really badly if you 
plugged a single 100 Mbit device into them, even on the other ports. 
Your mileage may vary on that though.  If you can go all Gigabit you'll 
probably not need to do anything really complicated like a parallel 
backup network.

> We'd also like our tape-rotation policy, for at least some of our tapes, to 
> mirror as closely as possible what we do for our existing servers with local 
> tape drives:  daily tape rotation in a two-week cycle,  with tapes written at 
> night and taken off-site for one week starting the day after they're written. 
>  
> That gives us an 18-hour window in which to write the tapes, and we should be 
> able to fill an 800-GB tape in 17 hours 46 minutes ( 800e8 / 1.25e7 / 3600 = 
> 17.77 ) at Fast Ethernet speed.  We probably have less data than that to back 
> up;  in fact, if we keep our other current tape drives and don't back 
> up /usr/portage or similar directories anywhere, we probably have less than 
> 400GB.  Therefore,  I think we should do a full backup each day; perhaps even 
> a full backup of the first snapshot and incremental backups for later 
> snapshots that same day.  Is that reasonable?  
> 
> Is it possible to initiate an incremental backup that would store all changes 
> against the contents of a certain medium?  (Say tape 5 is in the drive today 
> and has a 380GB full backup and 6 20-GB incremental backups going back 3 
> months.  File /foo/bar/xxx changed monday and tuesday, so the newest copy is 
> on the tuesday tape;  but write a copy to the friday tape as well.)

This was too confusing for me to follow.  If you want more than one copy 
of a file that changes often you can use Differential backups in 
addition to just Fulls and Incrementals.

> Has anyone seen similar speed problems with a NetApp filer, or another device 
> that serves up snapshots of iSCSI or FCP LUNs,  and solved them?
> 
> Supposing that round-trip-time over the network or disk seek latency on the 
> NetApp is the problem,  could we solve it by running multiple parallel backup 
> jobs to the same tape (without spooling)?
> 

You can easily try that but adjusting the concurrency values and 
reloading bacula.  If it is still slow on the same backups then it's 
likely the backup server is the bottleneck, imho.

> How can we initiate an external script from Bacula that would do all the 
> snapshots and mount them before any backup job runs; or would we have to do 
> that kind of thing from cron? 

Check the bacula manual for "run before" and "client run before" commands.

> It took about 5 minutes to enter the "select files" phase when doing a 
> restore 
> of a backup with 7 GB of data and 128000 files.  Does that mean that if we 
> made one big backup job over all hosts with 700 GB of data, it would take 8 
> hours to enter the "select files" phase?
> 

This suggests your catalog is either missing an index or just too slow 
for your environment.  You should check to make sure that the right 
indexes are present on your tables.  If they are, then you need to make 
sure you have 1) enough RAM in the server to handle the database and 2) 
fast enough disc access for database stuff to not be the bottleneck.

You've spent a substantial amount of money on the tape drive, and the 
netapp... make sure everybody who has budgetary authority knows you 
can't skimp on the backup server.  You can get a really speedy Dell 
server with two dual-core (or even quad-core now) CPU's, some fast 
discs, and a lot of RAM for around $5k.

For backing up your slower servers, you're going to need to use spooling 
to keep that LTO-4 drive from shoe-shining.  I don't think any single 
SATA drive can keep up with the speed an LTO-4 drive demands so that 
means you need a decently performing RAID setup.  You'll probably want 
the catalog database to be on it's own RAID set too, so that means a 
minimum of 4-5 drives.  My setup is two 500GB SATA's (RAID-1) plus four 
146GB 10,000 RPM SAS drives (RAID-0).  The RAID-0 set does nothing but 
store the bacula spool files.  Even this setup may not be fast enough to 
keep more than 1 LTO-3 or LTO-4 drive happy but eventually I'm going to 
find that out.  I peak out around 30 MB/sec which may be the absolute 
min speed you can run an LTO-4 drive at, but you'll need to check the 
specs for your specific drive to find out what your target needs to be. 
  Adding more drives to the RAID-0 set is a simple way to improve it's 
performance but the rest of the setup has to be up to the task too.


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>