Bacula-users

[Bacula-users] Large numbers of files (Was: backing up automounted FS)

2016-03-31 18:29:03
Subject: [Bacula-users] Large numbers of files (Was: backing up automounted FS)
From: Marcel De Boer <marcel.de_boer AT nokia DOT com>
To: EXT Dimitri Maziuk <dmaziuk AT bmrb.wisc DOT edu>
Date: Thu, 31 Mar 2016 23:00:41 +0200
Hi!

>> ...  Unfortunately, I'm
>> not exaggerating to say that we have users with file profiles that
>> consist of 35million files (or more) with average file sizes less than
>> 10kb.  The user might be long-gone, and eligible for deletion, but it
>> takes forever to spider that filesystem, and occasionally causes
>> problems like directory cache thrashing, etc.
>
> "I don't have a solution but I admire the problem". How long would
> bacula take to stat 35M files?

For us: the incremental currently takes 19 hours for 47M files on an 
otherwise idle 44-disk hardware RAID10. Yes, this is slowly starting to 
become a problem.

The main issues I see are:
  - That fileserver doesn't really have enough memory to keep all metadata
    cached (which is our problem, I'm working on migrating to newer
    hardware).
  - The FD seems to run in a single thread, so it only does one stat at a
    time. This means that effectively only one or two disks from the array
    are used at any given time. Is there a way to parallelize the FD?
    (something simpler than one job per user homedir, that would be a
    management nightmare)

We had the same issue with an rsync of the same fileset (those files are 
rsynced to a standby fileserver; Bacula then takes the backup from that 
server to avoid burdening the active fileserver twice.) Parallelizing that 
over 20 rsync threads reduced the sync time from 12 to 2 hours, that's why 
I'm wondering if it would be possible for Bacula to do that too.

Gtnx
        Marcel

-- 
Marcel de Boer
Test engineer, Service Routing R&D, IP/Optical Networks
Nokia, Antwerp, Belgium

On Thu, 31 Mar 2016, EXT Dimitri Maziuk wrote:

> On 03/31/2016 12:35 PM, Lloyd Brown wrote:
>
>> ...  Unfortunately, I'm
>> not exaggerating to say that we have users with file profiles that
>> consist of 35million files (or more) with average file sizes less than
>> 10kb.  The user might be long-gone, and eligible for deletion, but it
>> takes forever to spider that filesystem, and occasionally causes
>> problems like directory cache thrashing, etc.
>
> "I don't have a solution but I admire the problem". How long would
> bacula take to stat 35M files?
>
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users