Bacula-users

Re: [Bacula-users] Idea/suggestion for dedicated disk-based sd

2010-04-06 14:21:19
Subject: Re: [Bacula-users] Idea/suggestion for dedicated disk-based sd
From: Henrik Johansen <henrik AT scannet DOT dk>
To: <bacula-users AT lists.sourceforge DOT net>
Date: Tue, 6 Apr 2010 20:18:09 +0200
On 04/ 6/10 06:28 PM, Phil Stracchino wrote:
> On 04/06/10 12:06, Josh Fisher wrote:
>> On 4/6/2010 8:42 AM, Phil Stracchino wrote:
>>> On 04/06/10 02:37, Craig Ringer wrote:
>>> Well, just off the top of my head, the first thing that comes to mind is
>>> that the only ways such a scheme is not going to result in massive disk
>>> fragmentation are:
>>>
>>>    (a) it's built on top of a custom filesystem with custom device drivers
>>> to allow pre-positioning of volumes spaced across the disk surface, in
>>> which case it's going to be horribly slow because it's going to spend
>>> almost all its time seeking track-to-track; or
>>
>> I disagree. A filesystem making use of extents and multi-block
>> allocation, such as ext4, is designed for large file efficiency by
>> keeping files mostly contiguous on disk. Also, filesystems with delayed
>> allocation, such as ext4/XFS/ZFS, are much better at concurrent i/o than
>> non-delayed allocation filesystems like ext2/3, reiser3, etc. The
>> thrashing you mentioned is substantially reduced on writes, and for
>> restores, the files (volumes) remain mostly contiguous. So with a modern
>> filesystem, concurrent jobs writing to separate volume files will be
>> pretty much as efficient as concurrent jobs writing to the same volume
>> file, and restores will be much faster with no job interleaving.
>
>
> I think you're missing the point, though perhaps that's because I didn't
> make it clear enough.
>
> Let me try restating it this way:
>
> When you are writing large volumes of data from multiple sources onto
> the same set of disks, you have two choices.  Either you accept
> fragmentation, or you use a space allocation algorithm that keeps the
> distinct file targets self-contiguous, in which case you must accept
> hammering the disks as you constantly seek back and forth between the
> different areas you're writing your data streams to.
>
> Yes, aggressive write caching can help a bit with this.  But when we're
> getting into data sizes where this realistically matters on modern
> hardware, the data amounts have long since passed the range it's
> reasonable to cache in memory before writing.  Delayed allocation can
> only help just so much when you're talking multiple half-terabyte backup
> data streams.

No - aggressive write caching is key to solving a large part of this 
problem. Write caching to DRAM in particular is a very efficient way of 
doing this since it is relatively cheap and most modern servers have a 
lot of DRAM banks.

It also leaves room for flexibility since you easily can tune your cache 
size to your workload.

I have no problem saturating a 4 Gbit LAG group (~400 MB/s) when running 
backups via Bacula and data *only* touches the disks every 15 to 20 
seconds when ZFS flushes its transaction groups to spinning rust.

Adding more DRAM would probably push this all the way to 30 seconds, 
perhaps less once I convert this box to 10 Gbit ethernet.

These 15-20 seconds are more than enough for ZFS's block allocator to do 
its magic.

>


-- 
Med venlig hilsen / Best Regards

Henrik Johansen
henrik AT scannet DOT dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>