BackupPC-users

Re: [BackupPC-users] Too many links : where is the problem ?

2011-08-09 18:47:02
Subject: Re: [BackupPC-users] Too many links : where is the problem ?
From: Holger Parplies <wbppc AT parplies DOT de>
To: friedmann <backuppc-forum AT backupcentral DOT com>, alefriedx AT gmail DOT com
Date: Wed, 10 Aug 2011 00:44:43 +0200
Hi,

friedmann wrote on 2011-08-09 06:21:56 -0700 [[BackupPC-users]  Too many links 
: where is the problem ?]:
> thanks all for your answers, i will try to be more specific and precise. I
> was somehow hoping  that the problem was common, and that a simple answer
> existed.

even if there was a common problem with such an error message, it would be a
good idea to give enough detail to make clear whether your instance of the
problem is identical. The more precise your question is, the more simple the
answer can be. The information you give below is very helpful, and the answer
is actually simple.

> i tried the piece of script you provided : 
> perl -e 'for (my $i = 2; ; $i ++) { link "1", $i or die "Cant create ${i}th 
> link: $!\n"; }' 
> 
> it returned, within 1-2 sec : 
> "Cant create 32001th link: Trop de liens"
> So yes, notice that i translated the error msg, for communication and
> googling purposes : "Trop de liens" is the actual message, and now i realize
> the importance of that detail ...

Actually, *in this case*, the translation is more helpful, because it happens
to be a system error message, not a message from the BackupPC code (meaning we
can't find it verbatim in the code in any case). With the context from the log
you give below, the problem becomes quite obvious.

> I launched one BackupPC_dump this night, with the limit moved to 100000 :

Errm, that won't help. You determined above that the limit for your (current)
file system is 32000, so please change HardLinkMax back to 32000 (or 31999,
just to be safe). Your file system will fail to create another hard link when
its limit is reached, no matter what BackupPC thinks. The point of HardLinkMax
is to give BackupPC a hint when *not to try* to create another hard link,
because it would fail. That makes its operation a bit more efficient and
currently also prevents losing data for the file in question (so, currently,
a too high value of HardLinkMax can be a problem).

> it failed the same way, as expected.
> 
> [Question] How can i check that "the number of links on one or more pool
> files exceeds HardLinkMax", and figure out what is special/unique about such
> a file ? [Question]

Well, the problem is that no file *can* exceed the file system imposed maximum
number of hard links (and HardLinkMax is just a hint; exceeding that has no
meaning). Presuming you are looking for *files* that are close to hitting (or
have already hit) the limit, you might be asking for this:

        backuppc@backuppc-server% find /var/lib/backuppc/pc/192.168.2.105/27 
-type f -links +31990 -ls

or
        backuppc@backuppc-server% find /var/lib/backuppc/cpool -links +31990 -ls

but note that it's not actually a *file* and BackupPC's pooling mechanism
producing the problem.

> For your information, here are the last lines of the log generated by the 
> "backupPC_dump -v -f 192.168.2.105" command : 
> [...]
> [Arrow] mkdir 
> /var/lib/backuppc/pc/192.168.2.105/new//f%2fmedia%2fRhenovia/fVBox_Data_until_January_2011/fSBML_221010_IP3R_ForArnaud/fsbml2java_distrib/fDATE27octobre2010TIME09h17_huge9/fSim9997-singleInput1193.777e-9.0-J0_k188586.679e-3.0-IP33.8881551803E7e-9.0:
>  Trop de liens at /usr/share/backuppc/lib/BackupPC/Xfer/RsyncFileIO.pm line 
> 641

This is the important line. The details "Xfer/RsyncFileIO.pm line 641" tell us
where to look at the code, the path tells you that the directory
/media/Rhenovia/VBox_Data_until_January_2011/SBML_221010_IP3R_ForArnaud/sbml2java_distrib/DATE27octobre2010TIME09h17_huge9
is the place where your problem is. In fact, the error message in conjunction
with the attempted operation (mkdir) tells us what the problem is even without
looking at the code.

On a side note, the error message tells us that you are most probably using
BackupPC 3.1.0 ;-).

> "Trop de liens at /usr/share/backuppc/lib/BackupPC/Xfer/RsyncFileIO.pm line
> 641" seems to be the most insightful message : would you tell me what this
> RsyncFileIO.pm deals with ?

In a nutshell, RsyncFileIO does the I/O operations for the rsync transfer
(rsync or rsyncd). But in this case that's not really important. 'mkdir'
failing with 'Too many links' means that the parent directory already has
the maximum number of subdirectories. That wouldn't change with tar or smb
transport ;-).

> I also noticed a good clue, pointing out the number of subdirectories
> hence, i have counted the number of subdirectories that are on the backuped 
> drive, using : "find . -type d | wc -l", that returned me :
> 82779

That, in itself, is not meaningful. As long as you spread them out, you can
have just about as many directories as you want. It may affect performance,
but it will work.

Speaking of performance, look at how BackupPC organizes the pool: it's not one
giant directory containing millions of files, it's three levels of
subdirectories, as in 1/2/3/123456789abcdef0123456789abcdef0, each with only a
limited number of files - that makes access (comparatively) fast. If it were
all flat in a single directory, it would take a long time even to lookup a
single file (and it would probably hit some file system limit at some point).
Both of these points are also true for your source directory (the one with
more than 32000 subdirectories). As Les said, you should consider changing
the layout of your source tree if possible.

> And, maybe more insightful, i have 32002 subdirectories in the same
> directory, i.e. in the "most populated" one (the one on which the backup
> process fails : 
> /var/lib/backuppc/pc/192.168.2.105/new//f%2fmedia%2fRhenovia/fVBox_Data_until_January_2011/fSBML_221010_IP3R_ForArnaud/fsbml2java_distrib/fDATE27octobre2010TIME09h17_huge9)
> 
> I found out that, using ext3, a limit for the number of subdirectories in
> one directory could be 31998 
> (http://superuser.com/questions/66331/what-is-the-maximum-number-of-folders-allowed-in-a-folder-in-linux).

Well, yes. If you look at an empty directory, you'll see it has two links: its
named entry in the parent directory, and its own '.' entry. So there are 31998
links left for subdirectories (they each have a '..' entry, pointing back at
the parent directory) until the 32000 link limit you have discovered above is
reached.

> It actually fails after 31998 created subdirectories on the ext3 partition
> tested on the host's ntfs partition : the test is passed. I even tried with
> a value of 500000 : I stopped it after 296000 subdirectories were created.

On NTFS, things are organized differently. I would speculate that they don't
use '..' links in the way UNIX file systems do. Jeffrey encountered a 8190
link limit if I remember correctly. I presume NTFS has a limit on directory
size which is exceeded after your 296000 subdirectories are created. If this
is true, creating a *file* at this point should fail, too, which won't be the
case on the ext3 partition.

> [Idea] So i consider the problem as identified : It is relatively not
> related to backuppc itself, but deals with a sort of filesystems
> incompatibility. [Idea]

Limitation, really. A file system needs to store the number of links to an
inode, and for this it needs to reserve a fixed space per inode. In ext3 that
seems to be 16 bits, interpreted as a *signed* int. Apparently, ext4 uses the
same structure as *unsigned* int and can accomodate for 65000 links. Other file
systems will equally limit the value, though they might allow for more links
(XFS would seem to use 32 bits, but I'm just guessing). The tradeoff is that
you use up this space for each and every file, but you only need it
comparatively rarely. The only link counts above 100 I find on my Linux
installation (/ /usr /var) are for directories, with the largest being 932
(/usr/share/doc, where Debian stores one subdirectory for each installed
software package). The largest link count I find on a *file* is 10 (!).
On a normal system, you don't actually *want* link counts above 32000,
because they make things inefficient (for directories, that is; for files
you simply don't need them). BackupPC's file system usage is a very special
and untypical case.

> [Question] And finally, the only question that lasts is : how can i increase
> the number of subdirectories tolerated by ext3 ? [Question]

You can't (*), but you *might* be able to simply mount your pool file system
as ext4. This would raise the limit enough for the moment, but it won't
prevent you from running into the same issue again later. Apparently, your
backup source can store more subdirectories per directory than ext4, so if
you exploit that to the limits, you'll exceed what your backups can store.

Changing to a different file system (other than ext4) is far more complicated,
because you would need to copy your pool, which might take really long and
need a lot of space.

So, the Simple Answer(tm) is: You need to use a different file system for pool
storage, with ext4 probably being the best choice for now.

*BEFORE you remount your file system as ext4* you might want to consider the
implications. I'm not too sure, but I've heard you can mount any ext3 FS as
ext4, but you might not be able to mount it as ext3 again later (which makes
sense, considering you might have files with > 32000 links, for example).
ext4 appears to be reasonably stable, but that's just from what I've heard (or
rather haven't heard). I haven't used it with BackupPC yet. It probably also
depends on which kernel and which Linux distribution you are running. Of
course, your kernel and Linux distribution need to actually *support* ext4 as
a file system if you want to use it. Recent kernels and distributions should.

That's the quickest solution to your problem I can think of. Personally, I
prefer XFS as BackupPC pool FS, but that's just my opinion, and it's not a
*quick* solution, unless you can afford to wipe out your online pool history.

I hope that clears up some things. Don't hesitate to ask if you need more
information.

Regards,
Holger

(*) Actually, you probably *could* patch the ext3 kernel sources to raise the
    limit *slightly*. 32000 is a rather random figure. A signed 16 bit integer
    can store values up to 32767. Some of the reserved values might be used for
    a particular purpose. Considering you are only 2 or 3 links short, your
    current problem could probably be solved. *But* this is most likely a lot
    of work, creating an incompatible FS you can't use with other kernels or
    even non-patched e2fsprogs (if e2fsck is good, it will complain about -
    and probably eliminate - inodes with excessive link counts), just for
    solving a problem that is unlikely to reappear in the exact same form
    ever again. It's nothing I'd really consider doing, let alone recommend.

------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model 
configuration take the hassle out of deploying and managing Subversion and 
the tools developers use with it. Learn more about uberSVN and get a free 
download at:  http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/