BackupPC-users

[BackupPC-users] RsyncP: BackupPC_dump consumes 100% CPU and progress stalls

2009-12-13 08:08:58
Subject: [BackupPC-users] RsyncP: BackupPC_dump consumes 100% CPU and progress stalls
From: Tim Connors <tconnors AT rather.puzzling DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 13 Dec 2009 23:49:31 +1100 (EST)
backuppc has spontaneously a few days ago started getting stuck on a
partition on a machine that it has been dealing with fine for 120 days
now.

The logfile gives:

2009-12-02 08:46:04 incr backup 122 complete, 38 files, 46425552 bytes, 0 
xferErrs (0 bad files, 0 bad shares, 0 other)
2009-12-03 08:24:08 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /
2009-12-03 08:27:03 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /var
2009-12-03 08:28:12 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /ssb_local
2009-12-03 08:31:26 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /opt
2009-12-03 08:31:47 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /scratch
2009-12-03 08:31:57 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /6dfdata
2009-12-03 08:32:08 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /6dfgs
2009-12-03 08:33:30 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /6dfdata2
2009-12-03 08:44:02 incr backup started back to 2009-11-29 08:18:01 (backup 
#119) for directory /usr/local
2009-12-03 08:44:58 incr backup 123 complete, 46 files, 129608707 bytes, 0 
xferErrs (0 bad files, 0 bad shares, 0 other)
2009-12-04 08:18:49 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /
2009-12-04 08:23:45 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /var
2009-12-04 08:25:01 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /ssb_local
2009-12-04 08:28:09 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /opt
2009-12-04 08:28:32 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /scratch
2009-12-04 08:28:41 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /6dfdata
2009-12-04 08:28:50 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /6dfgs
2009-12-04 08:29:21 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /6dfdata2
2009-12-04 08:40:29 incr backup started back to 2009-12-03 08:24:08 (backup 
#123) for directory /usr/local
2009-12-04 08:41:25 incr backup 124 complete, 37 files, 47108022 bytes, 0 
xferErrs (0 bad files, 0 bad shares, 0 other)
2009-12-05 08:18:02 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /
2009-12-05 08:21:32 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /var
2009-12-05 08:23:00 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /ssb_local
2009-12-05 08:26:45 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /opt
2009-12-06 04:27:03 Aborting backup up after signal ALRM
2009-12-06 04:27:05 Got fatal error during xfer (aborted by signal=ALRM)
2009-12-06 10:03:04 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /
2009-12-06 10:06:13 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /var
2009-12-06 10:07:57 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /ssb_local
2009-12-06 10:11:40 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /opt
2009-12-07 06:12:02 Aborting backup up after signal ALRM
2009-12-07 06:12:04 Got fatal error during xfer (aborted by signal=ALRM)
2009-12-07 08:21:50 incr backup started back to 2009-12-04 08:18:49 (backup 
#124) for directory /

It's a slowaris machine running rsync 2.6.8, and the backuppc server is
debian lenny, running backuppc 3.1.0.

A typical gdb trace on the runaway backuppc_dump process looks like this:

sudo gdb /usr/share/backuppc/bin/BackupPC_dump 20538
bt
#0  0x00007f779bdd24c0 in Perl_Gthr_key_ptr@plt () from 
/usr/lib/perl5/auto/File/RsyncP/FileList/FileList.so
#1  0x00007f779bdd67f2 in XS_File__RsyncP__FileList_get () from 
/usr/lib/perl5/auto/File/RsyncP/FileList/FileList.so
#2  0x00007f779e526eb0 in Perl_pp_entersub () from /usr/lib/libperl.so.5.10
#3  0x00007f779e525392 in Perl_runops_standard () from /usr/lib/libperl.so.5.10
#4  0x00007f779e5205df in perl_run () from /usr/lib/libperl.so.5.10
#5  0x0000000000400d0c in main ()

(always in RsyncP, almost obviously).

It has no files open according to /proc/<pid>/fd, other than logfiles etc.

It's cpu bound, making no syscalls, so attaching an strace process to it
yields nothing (and I wasn't enlightened when I tried to attach it from
the start).

If I remove the /opt partition one day, then add it the next day, it locks
up again that next day while trying to do /opt.

If I delete 123, 124 (and the newer copies without /opt) and modify the
backups file to make 122 the last one, and thus force it to compare
against backup 119 rather than 123, it still locks up on /opt
(incrlevels is 1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5)

-- 
TimC
"How should I know if it works?  That's what beta testers are for.  I only
coded it."        (Attributed to Linus Torvalds, somewhere in a posting)

------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>
  • [BackupPC-users] RsyncP: BackupPC_dump consumes 100% CPU and progress stalls, Tim Connors <=