On 25/03/2010 11:36 a.m., James Harper wrote:
>>
>> We've used Legato networker before (we still do, as we're not yet
>> successfully completed the migration; and it's looking more and more
>> grim prospect by the day), and on approximately the same dataset (of
>> about 500 million records spread over 100 servers) and somewhat
>> weaker hardware, it would allow user to start selecting files to
>> restore in matter of *seconds* (and it was using it's simple db6
>> files, no server/database tuning required at all)
>>
>> Now with bacula 5.0.1, we have to wait several *hours* before we can
>> start selecting files to restore, and it is considered "normal" ?!
>>
>
> I've always thought that Bacula could do this a bit better. If you are
> selecting files rather than restoring everything then the chances are
> you only want a small subset of all files (always exceptions of course),
> so why read the whole tree in at once? Why not read it in as required,
> or read it in 'layer by layer' in the background so the user can start
> selecting files immediately.
>
> Complexity is probably the reason why it hasn't been done, but it would
> be an interesting project.
Actually it's not that hard (I've done it), but some of the queries can
be quite slow, particularly the one to find all the subdirectories of a
given chosen directory. I'm working on our own internal Web GUI for
doing restores (ExtJS with a tree-based view of the filesystem for
selecting files); I've found that you can either:
a) Do it with low memory usage (not building a tree and doing ad-hoc
queries as you go, recursing through the selected directories), but
it'll be quite slow, or
b) Use memory and pre-build time to build the directory tree in memory,
then relatively quickly select.
(a) is better for small restores, (b) is better for restores of more
than a few hundred files; if your database is grinding to a halt
building the tree, it's gonna truly suck doing lots of small queries of
large datasets required for (a). In the end I'm going to give the users
a choice of which method, so they can make a human decision. It's hard
to make that choice in code, because when the user has just selected a
top level directory, the code doesn't know how deep the tree below that
is. Could be 10 files, could be 10 million :)
And for anyone interested: Doing (b) in php is not a good idea.
160byte overhead just to create an empty object, another 60+bytes per
stored integer . Blech. Perl is not amazingly better, but can be
wrangled down to more tolerable memory sizes with some trickery. The
tree data storage really needs to be done in something
language/mechanism that actually only uses 4 bytes to store a 32-bit
integer :)
Craig Miskell
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|