BackupPC-users

[BackupPC-users] BackupPC_trashClean (?) freezes system

2016-06-24 09:48:11
Subject: [BackupPC-users] BackupPC_trashClean (?) freezes system
From: Witold Arndt <witold.arndt AT eule-gdi DOT de>
To: backuppc-users AT lists.sourceforge DOT net
Date: Fri, 24 Jun 2016 15:28:37 +0200
Hello everyone,

we have an instance of backuppc 3.3.0 running since quite some time, however 
since some weeks the backup vm seems to hang repeatedly with 100% cpu 
utilisation. The web-interface is not accessible any more and restarting the 
service doesn't work.

I'm a bit at a loss where to debug since the logs (system and backuppc) show 
no problems.

My observations (nagios, logs) and checks so far:

contents of LOG just up to the freeze:
... 
2016-06-17 01:00:00 Running 2 BackupPC_nightly jobs from 0..15 (out of 0..15)
2016-06-17 01:00:00 Running BackupPC_nightly -m 0 127 (pid=31037)
2016-06-17 01:00:00 Running BackupPC_nightly 128 255 (pid=31038)
2016-06-17 01:00:00 Next wakeup is 2016-06-17 02:00:00
2016-06-17 01:00:24 Started full backup on xxx (pid=31202, share=/)
2016-06-17 02:00:01 Next wakeup is 2016-06-17 03:00:00
2016-06-17 02:00:23 Started full backup on xxx (pid=1732, share=/)
2016-06-17 03:00:00 Next wakeup is 2016-06-17 04:00:00
2016-06-17 04:00:00 Next wakeup is 2016-06-17 05:00:00
2016-06-17 05:00:00 Next wakeup is 2016-06-17 06:00:00
2016-06-17 05:00:46 Started incr backup on xxx (pid=10448, share=/)
2016-06-17 06:00:01 Next wakeup is 2016-06-17 07:00:00
2016-06-17 06:32:31 BackupPC_nightly now running BackupPC_sendEmail
2016-06-17 06:32:43 Finished  admin1  (BackupPC_nightly 128 255)
2016-06-17 06:35:55 Finished  admin  (BackupPC_nightly -m 0 127)
2016-06-17 06:35:55 Pool nightly clean removed 0 files of size 0.00GB
2016-06-17 06:35:55 Pool is 0.00GB, 0 files (0 repeated, 0 max chain, 0 max 
links), 1 directories
2016-06-17 06:35:55 Cpool nightly clean removed 2709 files of size 0.12GB
2016-06-17 06:35:55 Cpool is 119.03GB, 1993960 files (325 repeated, 15 max 
chain, 32000 max links), 4369 directories
2016-06-17 07:00:00 Next wakeup is 2016-06-17 08:00:00
2016-06-17 08:00:00 Next wakeup is 2016-06-17 09:00:00
2016-06-17 08:35:25 Finished full backup on xxx
2016-06-17 08:35:25 Running BackupPC_link xxx (pid=24792)
2016-06-17 08:35:26 Finished xxx (BackupPC_link xxx)
2016-06-17 09:00:01 Next wakeup is 2016-06-17 10:00:00
2016-06-17 09:29:57 Started incr backup on xxx (pid=10448, share=/data/)
2016-06-17 09:30:41 Finished incr backup on xxx
2016-06-17 09:30:42 Running BackupPC_link xxx (pid=27594)
2016-06-17 09:30:43 Finished xxx (BackupPC_link xxx)
2016-06-17 09:47:33 Finished full backup on xxx
2016-06-17 09:47:33 Running BackupPC_link xxx (pid=28375)
2016-06-17 09:47:35 Finished xxx (BackupPC_link xxx)
2016-06-17 10:00:00 Next wakeup is 2016-06-17 11:00:00
2016-06-17 11:00:00 Next wakeup is 2016-06-17 12:00:00
backuppc froze at next wakeup 2016-06-17 12:00:00

* system info after hard reset:

    The servers PID is 1807, on host x, version 3.3.0, started at 6/24 14:22.
    This status was generated at 6/24 14:39.
    The configuration was last loaded at 6/24 14:22.
    PCs will be next queued at 6/24 15:00.
    Other info:
        0 pending backup requests from last scheduled wakeup,
        0 pending user backup requests,
        0 pending command requests,
        Pool is 119.03GB comprising 1993960 files and 4369 directories (as of 
6/17 06:32),
        Pool hashing gives 325 repeated files with longest chain 15,
        Nightly cleanup removed 2709 files of size 0.12GB (around 6/17 06:32),
        Pool file system was recently at 7% (6/24 14:32), today's max is 7% 
(6/17 01:00) and yesterday's max was 7%. 


* The vm has enough memory (4 GB) which isn't depleted even when frozen, CPU 
load is ok, CPU utilisation is 15% system, 0% user, 85% wait

Graphs: http://imgur.com/a/ptzAf

* all other parameters which are monitored (kernel, FS, disk io) look ok  

* inodes, permissions and fsck without errors

* permissions in /var/lib/backuppc/trash/ are correct

* after several days BackupPC_trashClean seems to hang without doing anything 
as far as I can see when sshing onto the machine and looking in htop

* killing the process or its parents has no effect, it just goes to zombie 
mode


So, where should I start to debug this?

Best regards,
Witold
-- 
.sig

Attachment: signature.asc
Description: This is a digitally signed message part.

------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>