BackupPC-users

Re: [BackupPC-users] Best FS for BackupPC

2011-05-25 14:16:33
Subject: Re: [BackupPC-users] Best FS for BackupPC
From: Holger Parplies <wbppc AT parplies DOT de>
To: mstowe AT chicago.us.mensa DOT org, "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 25 May 2011 20:15:47 +0200
Hi,

first of all, my personal experience with reiserfs is also that it lost a
complete pool FS (apparently, the cpool directory "disappeared" and was
re-created by BackupPC *several times* before I noticed the problem).
Rebuilding the tree obviously gave me a state that is next to impossible
to fix properly (lots of directories in lost+found named by inode - any
volunteers for finding out, where and within which pc/ directory to put
them? ;-), let alone verify the results.

My decision was to move to a different FS. I didn't go the scientific way, I
just chose xfs, which apparently was a good choice - at least up to now.

So I certainly don't disagree with your results, but I do partly disagree with
your reasoning and interpretations.

Michael Stowe wrote on 2011-05-25 08:40:10 -0500 [Re: [BackupPC-users] Best FS 
for BackupPC]:
> [Adam wrote:]
> > On 24/05/2011 11:25 PM, Michael Stowe wrote:
> >> [...] The high level results:
> >>
> >> jfs, xfs:  quick, stable
> >> reiserfs:  not stable
> >> ext4:      slow
> >> ext3:      very slow

While that is a nice summary, I, personally, wouldn't base any decisions
solely on a summary without having any idea how the results were obtained,
because the methods could be flawed or simply not take my main concerns into
account (e.g. if I have my BackupPC server on a UPS, power loss is not my
primary concern (though it may still be one); long term stability is). For
other people, speed may be vital, while the ability to survive a power failure
is not. You explain in a followup (see below) how you obtained your results.

> >> The "not stable" designation comes from power-off-during-write tests.
> >> [...]
> >
> > Just a couple of my own personal comments on reiserfs:
> > 1) It does usually handle random power-offs on both general servers and
> > backuppc based servers.
> 
> "Usually" doesn't really do it for me.

I believe that is exactly the point. You simply can't *test* whether a file
system handles *every* power-off case correctly. You can prove that it
doesn't, or you can find that you didn't manage to trigger any problems. So,
while I agree with "reiserfs does *not* handle power-offs sufficiently well",
I don't see it as *proven* that xfs/jfs/ext4/ext3 are any better. They might
be better, they might be worse. They are *probably* better, but that is just
speculation. Granted, I'd prefer an FS where I didn't manage to trigger any
problems over one where I did, too. Or one, where the majority of the
community seems to agree that it performs better. However, both choices are
based on experience, not on scientific results.

> The problem seems to be in the structure of the trees and the rebuild tree
> routines, which just grabs every block that looks like they're reiserfs
> tree blocks.

If that is the case, it is certainly problematic. What I also dislike is that
'reiserfsck --rebuild-tree' leaves your FS in an unusable state until it has
completed - let's hope it does complete. All other 'fsck' programs I can
remember having used seem to operate in an "incremental" way - fixing problems
without causing new ones (except maybe trivial "wrong count" type
inconsistencies), so they can [mostly] be interrupted without making the
situation worse than it was.

> > 3) I've used reiserfs on both file servers and backuppc servers for
> > quite a long time [...] One backuppc server I used it with [...] did
> > daily backups of about 5 servers with a total of 700G data. [...]
> 
> There are plenty of things that run perfectly well when unstressed.

What is your understanding of "unstressed"?

> > Perhaps in your testing you either didn't enable the correct journalling
> > options, or found that particular corner case. Perhaps next time it
> > happens jfs/xfs might hit their corner cases.
> 
> This doesn't ring true nor does it match the results of my testing.  I
> didn't tune any file systems.

Perhaps you should have. The default options are not always suitable for
obtaining what you need. In what way doesn't "next time jfs/xfs might hit
their corner cases" match the results of your testing? As I said, I don't
believe you've proven that jfs/xfs don't *have* corner cases. You just didn't
expose any.

> You can speculate that xfs and jfs may contain the same flaws but some kind
> of blind luck kept them working properly, but it seems *really* unlikely.

The speculation is, that you didn't test the situations that xfs or jfs might
have problems with (and reiserfs might handle perfectly).

> Further, simply "running" a filesystem is not the same as testing and
> recovering it.  It's certainly possible to have run a FAT filesystem under
> Windows 3.1 for 20 years.  This doesn't make it a robust choice.

Certainly true. But all I can see here are different data points from
different people's *experience*. You're unlikely to experience running
*dozens* of FAT/Win3.1 file systems for 20 years, and if you do, it might
well be a robust choice *for your usage pattern*. That doesn't mean it
will work equally well with different usage patterns, or that if you suddenly
do encounter corruption, a different FS wouldn't be better recoverable.

> > [...]
> > I don't mean to disparage xfs/jfs or any testing anybody has done, just
> > wanted to share my personal experiences.
> 
> Since you don't appear to be arguing that people actually use reiserfs,
> you're speculating that xfs/jfs contain flaws without any apparent
> evidence, and your personal experiences don't appear to include testing,
> I'm not really sure where you're going with this.

"Sharing personal experiences"? Frankly, I prefer limited personal experiences
which explain their limitations over "scientific data" which gives no hints as
to how it was obtained. With the former, I can at least assess its value for
my considerations. As for your "testing", I might just as well object that you
didn't test the cases I would have been interested in. Does that make your
testing worthless?

Aside from that, I'm quite sure xfs and jfs contain flaws. Any software of
that complexity will. Other software (or hardware) may contain flaws that
corrupt an xfs/jfs. The question is, how good at handling inconsistencies
are these file systems? I would tend to agree that reiserfs does not seem
to be very good.

> [...] I'll give you an idea of what my testing entailed:
> 
> On a 7-drive array (5+2 RAID),

This is a good example of how hardware may corrupt your FS (or prevent
corruption that would occur with different hardware). If you are truely
interested in testing *the file systems*, you should not introduce the extra
complexity of RAID 6. You were probably more interested in testing *how the
file systems would operate in your hardware environment*. That is a
difference.

> make a new filesytem, and point the BackupPC pool at it.
> A script runs that uses X10 to physically power down the box once it
> senses the presence of a test file.  This script is used for both the
> initial backup and the link phase, both of which are restarted once during
> the trial.  (This script was originally timed to power down one hour into
> the backup, but since the backups ran at different speeds on the
> filesystems, it seemed likely that the backups would be at different
> points.  To avoid corner cases, they were powered down while backing up
> the exact same file.)

How do you know that you are *avoiding* corner cases? Maybe you are avoiding
xfs/jfs corner cases and consistently hitting a reiserfs corner case. In any
case, you are not really testing random power failures. I know that there will
be an element of randomness (hopefully your test file is (a) small and (b) not
yet in the pool) due to process scheduling, timing of flushing buffers, and
possibly other factors, but who can say how much randomness that is, i.e. how
much of the file system operation it exposes to the power failure?

> The pool is then compared (via rsync) to the test box; everything should
> be identical.

More or less. You'll have different timestamps in log files, a random
difference in timing (length of the file in progress) ... I'm just wondering
what exactly you are comparing. "pool" means $TopDir or $TopDir/{c,}pool or
$TopDir/pc?

What your test doesn't catch is long term stability. In the absense of power
failures, will your FS operate well over many years? I've heard (rumours, not
real data points) that reiserfs will operate smoothly up to the point where
accumulated internal inconsistency (presumably due to bugs) exceeds a certain
amount, and then it will destroy just about all of your file system. That
might even match my observation - I don't remember whether there was a power
failure involved or not. I have no long-term first-hand experience with xfs
(or jfs). Does anyone else?

Regards,
Holger

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/