BackupPC-users

[BackupPC-users] Slow backups; Collision issues

2009-07-01 11:25:38
Subject: [BackupPC-users] Slow backups; Collision issues
From: James Esslinger <slinger AT arlut.utexas DOT edu>
To: backuppc-users AT lists.sourceforge DOT net
Date: Wed, 01 Jul 2009 10:19:52 -0500
Hello,

I seem to be experiencing a problem with collisions on some data that is
being backed up from a server.  The problem didn't arise till a user
started populating a directory with thousands of bitmaps files.  It now
appears that all of these files are causing collisions and causing the
backups to slow down to a crawl.  Here's the information regarding the pool:

General Server Information

    * The servers PID is 3626, on host tapehost1, version 3.2.0beta0,
started at 6/12 14:42.
    * This status was generated at 7/1 09:54.
    * The configuration was last loaded at 6/12 14:52.
    * PCs will be next queued at 7/1 10:00.
    * Other info:
          o 2 pending backup requests from last scheduled wakeup,
          o 0 pending user backup requests,
          o 10 pending command requests,
          o Uncompressed pool:
                + Pool is 687.39GB comprising 1154808 files and 1279
directories (as of 6/30 07:09),
                + Pool hashing gives 5013 repeated files with longest
chain 4527,
                + Nightly cleanup removed 228 files of size 0.22GB
(around 6/30 07:09),
          o Compressed pool:
                + Pool is 438.99GB comprising 1658761 files and 2184
directories (as of 6/30 15:23),
                + Pool hashing gives 1611 repeated files with longest
chain 776,
                + Nightly cleanup removed 34 files of size 0.00GB
(around 6/30 15:23),
          o Pool file system was recently at 61% (7/1 09:51), today's
max is 61% (7/1 01:00) and yesterday's max was 61%.


Notice that the longest chain in the uncompressed pool is 4527.  If I
drill down to the location where the collisions happen I have:

# cd /ldisk/3ware0/backups/pool/e/f/4/
# ls -lah ef48707c04eed19414d0d42da047ea3f_0
ef48707c04eed19414d0d42da047ea3f_4526
-rw-r----- 2 backuppc backuppc 15M 2009-06-12 11:37
ef48707c04eed19414d0d42da047ea3f_0
-rw-r----- 2 backuppc backuppc 15M 2009-06-26 12:30
ef48707c04eed19414d0d42da047ea3f_4526

All the files between are the same size as well.  It appears that the
BackupPC_dump instance for this server takes forever comparing these
files.  It appears to loop over and over for each file and this can take
up to 2-3 days for these 4527 files.  Here's an strace of what the
process is doing.

# ps aux | grep BackupPC_dump
backuppc   717 14.0  2.6 260352 215796 ?       D    Jun28 636:51
/usr/bin/perl /usr/local/backuppc/bin/BackupPC_dump services

# strace -p 717
stat("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2496",
{st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0
open("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2496",
O_RDONLY) = 6
ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffdfe4c3c0) = -1 ENOTTY
(Inappropriate ioctl for device)
lseek(6, 0, SEEK_CUR)                   = 0
fstat(6, {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
lseek(7, 0, SEEK_SET)                   = 0
read(7, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "\377\0\377\377\377\0\377\377\377\0\377\377\377\0\377\377"...,
1048576) = 1048576
close(6)                                = 0
stat("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2497",
{st_mode=S_
open("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2497",
O_RDONLY) =
ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffdfe4c3c0) = -1 ENOTTY
(Inappropriate ioctl f
lseek(6, 0, SEEK_CUR)                   = 0
fstat(6, {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
lseek(7, 0, SEEK_SET)                   = 0
read(7, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "\377\0\377\377\377\0\377\377\377\0\377\377\377\0\377\377"...,
1048576) = 1048576
close(6)                                = 0

Here's an iostat dump for the filesystem the data is being backed up to:

% iostat -m 2 /dev/sdb
Linux 2.6.27.13-smp (tapehost1)         07/01/2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.94    0.06    2.00   11.81    0.00   85.44

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb            1557.00        90.95         0.05        181          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.87    0.19    2.18   11.19    0.00   85.20

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb            1750.75       102.06         0.05        205          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.12    0.00    2.19   11.62    0.00   85.31

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb            1600.50        92.55         0.06        185          0

As you can see, the device the filesystem sits upon is being pushed hard
and most likely to the limits.

So my questions are:

1.  Is there a way to change the hashing algorithm to prevent these
massive collisions?
2.  If not, are there any other ways to speed up this process so I can
get these backups finished in a more timely fashion?  The backup of this
system used to finish in 4-5 hours for a full now it takes 3+ days for
an incremental.


-- 
James Esslinger       --   slinger AT arlut.utexas DOT edu
System Administrator  --   Office: 512.835.3257
SISL/ARL:UT           --   Helpdesk: 512.490.4490

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/