BackupPC-users

[BackupPC-users] BackupPC_dump stalls and leave zombie process

2009-07-23 17:50:31
Subject: [BackupPC-users] BackupPC_dump stalls and leave zombie process
From: Steve <leperas AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 23 Jul 2009 17:45:55 -0400
Hello,

I am running Backuppc 3.1.0 with ubuntu 9.04 now, and since the
upgrade I've developed a problem with backuppc. During full backups,
the backup stalls and leaves a zombie.  The backup just sits (as shown
below) until I kill the ssh process or restart backuppc.  I have not
seen this problem with smb clients, and have not seen it with inc
backups (although the incs take a lot less time so maybe it will
happen sometime...), just the full with rsync.
-------------------
32167 backuppc  25   5     0    0    0 Z    0  0.0   5:21.56
BackupPC_dump <defunct>
32321 backuppc  25   5  7168 3600 1164 S    0  0.2   0:01.16 ssh
32587 backuppc  25   5 79768  73m 1220 S    0  3.7   2:31.20 BackupPC_dump
----------------------------

The status page says:
-----------------
localhost        full    backuppc        7/23 12:00      BackupPC_dump localhost
         32167           32321, 32587
--------------------

The computer doesn't crash or appear to have any problems, and if i
watch the whole time memory usage never goes above 10-20% total.  When
I kill the ssh process, the status page goes back to showing nothing,
and a partial backup is left as shown below on the home page for the
backup:

------------------
848      partial         yes     0       7/23 00:12      112.8           0.5    
/home/backup/pc/localhost/848
---------------

If I keep trying, eventually it makes it through the whole backup, as
shown here:

---------------
848      full    yes     0       7/23 15:04      152.4           0.1    
/home/backup/pc/localhost/848
-----------------

Since the backup eventually completes, and sometimes has to only be
restarted once or twice while other times 6 or 7 times, I can't figure
out where to start debugging.  It is reproducible in that I haven't
had a successful full backup without at least one restart since the
upgrade.

If some of you have some theories, I will be happy to capture
additional info about what is going on when this happens...I am not
sure offhand what info/logs/files would be important.  It does not
seem like a hardware problem to me.  The log itself ends like this:
----------------
same     764   506/500       49664
home/common/LabView/Gas_Swirl_Process_Control/flame_frf/dyn_data/WriteData2/WriteData2.opt
Parent read EOF from child: fatal error!
Done: 0 files, 0 bytes
Got fatal error during xfer (Child exited prematurely)
Backup aborted (Child exited prematurely)
Not saving this as a partial backup since it has fewer files than the
prior one (got 2 and 0 files versus 5)
-------------------
However i suspect that error is from me killing the ssh not whatever
caused things to stall.  i can't find any errors in the log when it is
stalled but before i kill processes as backuppc seems to think things
are running fine even though they have stalled out...

I am wondering if there is some kind of timeout or permission or other
default that has changed with ssh or rsync on the upgrade that is
causing this...however since it works for increments and the transport
is the same for those I don't see how it could be the case...

Also, is there a difference when the process is started from the cgi
interface vs. starting itself on schedule?  When it succeeds it has
always been after one or more restarts from the cgi...

anyway thanks, looking forward to hearing ideas from you guys on where to look.

Steve

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>