Re: [Bacula-users] restores not working

On 25/03/2010 11:36 a.m., James Harper wrote:
>>
>> We've used Legato networker before (we still do, as we're not yet
>> successfully completed the migration; and it's looking more and more
>> grim prospect by the day), and on approximately the same dataset (of
>> about 500 million records spread over 100 servers) and somewhat
>> weaker hardware, it would allow user to start selecting files to
>> restore in matter of *seconds* (and it was using it's simple db6
>> files, no server/database tuning required at all)
>>
>> Now with bacula 5.0.1, we have to wait several *hours* before we can
>> start selecting files to restore, and it is considered "normal" ?!
>>
>
> I've always thought that Bacula could do this a bit better. If you are
> selecting files rather than restoring everything then the chances are
> you only want a small subset of all files (always exceptions of course),
> so why read the whole tree in at once? Why not read it in as required,
> or read it in 'layer by layer' in the background so the user can start
> selecting files immediately.
>
> Complexity is probably the reason why it hasn't been done, but it would
> be an interesting project.
Actually it's not that hard (I've done it), but some of the queries can 
be quite slow, particularly the one to find all the subdirectories of a 
given chosen directory.  I'm working on our own internal Web GUI for 
doing restores (ExtJS with a tree-based view of the filesystem for 
selecting files); I've found that you can either:
a) Do it with low memory usage (not building a tree and doing ad-hoc 
queries as you go, recursing through the selected directories), but 
it'll be quite slow, or
b) Use memory and pre-build time to build the directory tree in memory, 
then relatively quickly select.

(a) is better for small restores, (b) is better for restores of more 
than a few hundred files; if your database is grinding to a halt 
building the tree, it's gonna truly suck doing lots of small queries of 
large datasets required for (a).  In the end I'm going to give the users 
a choice of which method, so they can make a human decision.  It's hard 
to make that choice in code, because when the user has just selected a 
top level directory, the code doesn't know how deep the tree below that 
is.  Could be 10 files, could be 10 million :)

And for anyone interested:  Doing (b) in php is not a good idea. 
160byte overhead just to create an empty object, another 60+bytes per 
stored integer . Blech.  Perl is not amazingly better, but can be 
wrangled down to more tolerable memory sizes with some trickery.  The 
tree data storage really needs to be done in something 
language/mechanism that actually only uses 4 bytes to store a 32-bit 
integer :)

Craig Miskell

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users