Hello everyone,
we have an instance of backuppc 3.3.0 running since quite some time, however
since some weeks the backup vm seems to hang repeatedly with 100% cpu
utilisation. The web-interface is not accessible any more and restarting the
service doesn't work.
I'm a bit at a loss where to debug since the logs (system and backuppc) show
no problems.
My observations (nagios, logs) and checks so far:
contents of LOG just up to the freeze:
...
2016-06-17 01:00:00 Running 2 BackupPC_nightly jobs from 0..15 (out of 0..15)
2016-06-17 01:00:00 Running BackupPC_nightly -m 0 127 (pid=31037)
2016-06-17 01:00:00 Running BackupPC_nightly 128 255 (pid=31038)
2016-06-17 01:00:00 Next wakeup is 2016-06-17 02:00:00
2016-06-17 01:00:24 Started full backup on xxx (pid=31202, share=/)
2016-06-17 02:00:01 Next wakeup is 2016-06-17 03:00:00
2016-06-17 02:00:23 Started full backup on xxx (pid=1732, share=/)
2016-06-17 03:00:00 Next wakeup is 2016-06-17 04:00:00
2016-06-17 04:00:00 Next wakeup is 2016-06-17 05:00:00
2016-06-17 05:00:00 Next wakeup is 2016-06-17 06:00:00
2016-06-17 05:00:46 Started incr backup on xxx (pid=10448, share=/)
2016-06-17 06:00:01 Next wakeup is 2016-06-17 07:00:00
2016-06-17 06:32:31 BackupPC_nightly now running BackupPC_sendEmail
2016-06-17 06:32:43 Finished admin1 (BackupPC_nightly 128 255)
2016-06-17 06:35:55 Finished admin (BackupPC_nightly -m 0 127)
2016-06-17 06:35:55 Pool nightly clean removed 0 files of size 0.00GB
2016-06-17 06:35:55 Pool is 0.00GB, 0 files (0 repeated, 0 max chain, 0 max
links), 1 directories
2016-06-17 06:35:55 Cpool nightly clean removed 2709 files of size 0.12GB
2016-06-17 06:35:55 Cpool is 119.03GB, 1993960 files (325 repeated, 15 max
chain, 32000 max links), 4369 directories
2016-06-17 07:00:00 Next wakeup is 2016-06-17 08:00:00
2016-06-17 08:00:00 Next wakeup is 2016-06-17 09:00:00
2016-06-17 08:35:25 Finished full backup on xxx
2016-06-17 08:35:25 Running BackupPC_link xxx (pid=24792)
2016-06-17 08:35:26 Finished xxx (BackupPC_link xxx)
2016-06-17 09:00:01 Next wakeup is 2016-06-17 10:00:00
2016-06-17 09:29:57 Started incr backup on xxx (pid=10448, share=/data/)
2016-06-17 09:30:41 Finished incr backup on xxx
2016-06-17 09:30:42 Running BackupPC_link xxx (pid=27594)
2016-06-17 09:30:43 Finished xxx (BackupPC_link xxx)
2016-06-17 09:47:33 Finished full backup on xxx
2016-06-17 09:47:33 Running BackupPC_link xxx (pid=28375)
2016-06-17 09:47:35 Finished xxx (BackupPC_link xxx)
2016-06-17 10:00:00 Next wakeup is 2016-06-17 11:00:00
2016-06-17 11:00:00 Next wakeup is 2016-06-17 12:00:00
backuppc froze at next wakeup 2016-06-17 12:00:00
* system info after hard reset:
The servers PID is 1807, on host x, version 3.3.0, started at 6/24 14:22.
This status was generated at 6/24 14:39.
The configuration was last loaded at 6/24 14:22.
PCs will be next queued at 6/24 15:00.
Other info:
0 pending backup requests from last scheduled wakeup,
0 pending user backup requests,
0 pending command requests,
Pool is 119.03GB comprising 1993960 files and 4369 directories (as of
6/17 06:32),
Pool hashing gives 325 repeated files with longest chain 15,
Nightly cleanup removed 2709 files of size 0.12GB (around 6/17 06:32),
Pool file system was recently at 7% (6/24 14:32), today's max is 7%
(6/17 01:00) and yesterday's max was 7%.
* The vm has enough memory (4 GB) which isn't depleted even when frozen, CPU
load is ok, CPU utilisation is 15% system, 0% user, 85% wait
Graphs: http://imgur.com/a/ptzAf
* all other parameters which are monitored (kernel, FS, disk io) look ok
* inodes, permissions and fsck without errors
* permissions in /var/lib/backuppc/trash/ are correct
* after several days BackupPC_trashClean seems to hang without doing anything
as far as I can see when sshing onto the machine and looking in htop
* killing the process or its parents has no effect, it just goes to zombie
mode
So, where should I start to debug this?
Best regards,
Witold
--
.sig
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape _______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|