BackupPC-users

Re: [BackupPC-users] Slow backups; Collision issues

2009-07-01 12:12:38
Subject: Re: [BackupPC-users] Slow backups; Collision issues
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 01 Jul 2009 12:07:32 -0400
James Esslinger wrote at about 10:19:52 -0500 on Wednesday, July 1, 2009:
 > Hello,
 > 
 > I seem to be experiencing a problem with collisions on some data that is
 > being backed up from a server.  The problem didn't arise till a user
 > started populating a directory with thousands of bitmaps files.  It now
 > appears that all of these files are causing collisions and causing the
 > backups to slow down to a crawl.  Here's the information regarding the pool:
 > 
 > General Server Information
 > 
 >     * The servers PID is 3626, on host tapehost1, version 3.2.0beta0,
 > started at 6/12 14:42.
 >     * This status was generated at 7/1 09:54.
 >     * The configuration was last loaded at 6/12 14:52.
 >     * PCs will be next queued at 7/1 10:00.
 >     * Other info:
 >           o 2 pending backup requests from last scheduled wakeup,
 >           o 0 pending user backup requests,
 >           o 10 pending command requests,
 >           o Uncompressed pool:
 >                 + Pool is 687.39GB comprising 1154808 files and 1279
 > directories (as of 6/30 07:09),
 >                 + Pool hashing gives 5013 repeated files with longest
 > chain 4527,
 >                 + Nightly cleanup removed 228 files of size 0.22GB
 > (around 6/30 07:09),
 >           o Compressed pool:
 >                 + Pool is 438.99GB comprising 1658761 files and 2184
 > directories (as of 6/30 15:23),
 >                 + Pool hashing gives 1611 repeated files with longest
 > chain 776,
 >                 + Nightly cleanup removed 34 files of size 0.00GB
 > (around 6/30 15:23),
 >           o Pool file system was recently at 61% (7/1 09:51), today's
 > max is 61% (7/1 01:00) and yesterday's max was 61%.
 > 
 > 
 > Notice that the longest chain in the uncompressed pool is 4527.  If I
 > drill down to the location where the collisions happen I have:
 > 
 > # cd /ldisk/3ware0/backups/pool/e/f/4/
 > # ls -lah ef48707c04eed19414d0d42da047ea3f_0
 > ef48707c04eed19414d0d42da047ea3f_4526
 > -rw-r----- 2 backuppc backuppc 15M 2009-06-12 11:37
 > ef48707c04eed19414d0d42da047ea3f_0
 > -rw-r----- 2 backuppc backuppc 15M 2009-06-26 12:30
 > ef48707c04eed19414d0d42da047ea3f_4526
 > 
 > All the files between are the same size as well.  It appears that the
 > BackupPC_dump instance for this server takes forever comparing these
 > files.  It appears to loop over and over for each file and this can take
 > up to 2-3 days for these 4527 files.  Here's an strace of what the
 > process is doing.

This makes sense and is relevant to a recent thread we had on pooling
and hashing.

I imagine that all your bitmaps in addition to being the same size
also have the same first 1MB (or at least the same 1st and 8th 128KB
blocks). This would lead to them all having the same pool hash which
would then require the file to be compared against all 4527 files in
the chain. If the files are big and substantially the same, then this
comparison *will* take a while though I'm not sure why it would take
2-3 days.

 ># ps aux | grep BackupPC_dump
 <clipped>
 
 > So my questions are:
 > 
 > 1.  Is there a way to change the hashing algorithm to prevent these
 > massive collisions?

This could be solved (to varying degrees) by any of the several
suggestions that I have previously made for modifying the pool
hashing, including:

1. Using a full file md5sum hash rather than a partial one
2. Using the full file md5sum hash as the index in case of a collision
3. Adding the full file md5sum hash to the file header.

I know there are various pros/cons of each of these extensions but
given that most people probably use rsync/rsyncd and given that protocol 30
gives you full file md5sums for free, it seems to make sense to
consider taking advantage of having a full file md5sum hash either
instead of or in addition to the partial file md5sum hash that is used
now.

 > 2.  If not, are there any other ways to speed up this process so I can
 > get these backups finished in a more timely fashion?  The backup of this
 > system used to finish in 4-5 hours for a full now it takes 3+ days for
 > an incremental.

How many such bitmap files are there? Unless there are many thousands
of them, I'm not sure how it goes from 4-5 hours to 3+ days.

 > 
 > 
 > -- 
 > James Esslinger       --   slinger AT arlut.utexas DOT edu
 > System Administrator  --   Office: 512.835.3257
 > SISL/ARL:UT           --   Helpdesk: 512.490.4490
 > 
 > ------------------------------------------------------------------------------
 > _______________________________________________
 > BackupPC-users mailing list
 > BackupPC-users AT lists.sourceforge DOT net
 > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
 > Wiki:    http://backuppc.wiki.sourceforge.net
 > Project: http://backuppc.sourceforge.net/
 > 

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/