Hello,
I seem to be experiencing a problem with collisions on some data that is
being backed up from a server. The problem didn't arise till a user
started populating a directory with thousands of bitmaps files. It now
appears that all of these files are causing collisions and causing the
backups to slow down to a crawl. Here's the information regarding the pool:
General Server Information
* The servers PID is 3626, on host tapehost1, version 3.2.0beta0,
started at 6/12 14:42.
* This status was generated at 7/1 09:54.
* The configuration was last loaded at 6/12 14:52.
* PCs will be next queued at 7/1 10:00.
* Other info:
o 2 pending backup requests from last scheduled wakeup,
o 0 pending user backup requests,
o 10 pending command requests,
o Uncompressed pool:
+ Pool is 687.39GB comprising 1154808 files and 1279
directories (as of 6/30 07:09),
+ Pool hashing gives 5013 repeated files with longest
chain 4527,
+ Nightly cleanup removed 228 files of size 0.22GB
(around 6/30 07:09),
o Compressed pool:
+ Pool is 438.99GB comprising 1658761 files and 2184
directories (as of 6/30 15:23),
+ Pool hashing gives 1611 repeated files with longest
chain 776,
+ Nightly cleanup removed 34 files of size 0.00GB
(around 6/30 15:23),
o Pool file system was recently at 61% (7/1 09:51), today's
max is 61% (7/1 01:00) and yesterday's max was 61%.
Notice that the longest chain in the uncompressed pool is 4527. If I
drill down to the location where the collisions happen I have:
# cd /ldisk/3ware0/backups/pool/e/f/4/
# ls -lah ef48707c04eed19414d0d42da047ea3f_0
ef48707c04eed19414d0d42da047ea3f_4526
-rw-r----- 2 backuppc backuppc 15M 2009-06-12 11:37
ef48707c04eed19414d0d42da047ea3f_0
-rw-r----- 2 backuppc backuppc 15M 2009-06-26 12:30
ef48707c04eed19414d0d42da047ea3f_4526
All the files between are the same size as well. It appears that the
BackupPC_dump instance for this server takes forever comparing these
files. It appears to loop over and over for each file and this can take
up to 2-3 days for these 4527 files. Here's an strace of what the
process is doing.
# ps aux | grep BackupPC_dump
backuppc 717 14.0 2.6 260352 215796 ? D Jun28 636:51
/usr/bin/perl /usr/local/backuppc/bin/BackupPC_dump services
# strace -p 717
stat("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2496",
{st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0
open("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2496",
O_RDONLY) = 6
ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffdfe4c3c0) = -1 ENOTTY
(Inappropriate ioctl for device)
lseek(6, 0, SEEK_CUR) = 0
fstat(6, {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
lseek(7, 0, SEEK_SET) = 0
read(7, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "\377\0\377\377\377\0\377\377\377\0\377\377\377\0\377\377"...,
1048576) = 1048576
close(6) = 0
stat("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2497",
{st_mode=S_
open("/ldisk/3ware0/backups/pool/e/f/4/ef48707c04eed19414d0d42da047ea3f_2497",
O_RDONLY) =
ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffdfe4c3c0) = -1 ENOTTY
(Inappropriate ioctl f
lseek(6, 0, SEEK_CUR) = 0
fstat(6, {st_mode=S_IFREG|0640, st_size=15267894, ...}) = 0
fcntl(6, F_SETFD, FD_CLOEXEC) = 0
lseek(7, 0, SEEK_SET) = 0
read(7, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "BM6\370\350\0\0\0\0\0006\0\0\0(\0\0\0\0\n\0\0\323\5\0\0"...,
1048576) = 1048576
read(6, "\377\0\377\377\377\0\377\377\377\0\377\377\377\0\377\377"...,
1048576) = 1048576
close(6) = 0
Here's an iostat dump for the filesystem the data is being backed up to:
% iostat -m 2 /dev/sdb
Linux 2.6.27.13-smp (tapehost1) 07/01/2009
avg-cpu: %user %nice %system %iowait %steal %idle
0.94 0.06 2.00 11.81 0.00 85.44
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 1557.00 90.95 0.05 181 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.87 0.19 2.18 11.19 0.00 85.20
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 1750.75 102.06 0.05 205 0
avg-cpu: %user %nice %system %iowait %steal %idle
1.12 0.00 2.19 11.62 0.00 85.31
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 1600.50 92.55 0.06 185 0
As you can see, the device the filesystem sits upon is being pushed hard
and most likely to the limits.
So my questions are:
1. Is there a way to change the hashing algorithm to prevent these
massive collisions?
2. If not, are there any other ways to speed up this process so I can
get these backups finished in a more timely fashion? The backup of this
system used to finish in 4-5 hours for a full now it takes 3+ days for
an incremental.
--
James Esslinger -- slinger AT arlut.utexas DOT edu
System Administrator -- Office: 512.835.3257
SISL/ARL:UT -- Helpdesk: 512.490.4490
------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|