Bacula-users

[Bacula-users] System crashes, call traces when backing up using bacula.

2010-12-23 08:06:20
Subject: [Bacula-users] System crashes, call traces when backing up using bacula.
From: Caspar Smit <c.smit AT truebit DOT nl>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 23 Dec 2010 14:03:16 +0100
Hi all,

I have a system running bacula backup for about a year with no problems.

The last two to three weeks i suddenly had occasional system crashes (complete hangs where I need to reset).

I checked the syslog for any clues and found this error popping up several times during backup:

[198937.504359] swapper: page allocation failure. order:0, mode:0x20
[198937.504359] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
[198937.504359]
[198937.504359] Call Trace:
[198937.504359]  <IRQ>  [<ffffffff80276c15>] __alloc_pages_internal+0x3a6/0x3bf
[198937.504359]  [<ffffffff802957ec>] kmem_getpages+0x96/0x15f
[198937.504359]  [<ffffffff80295e7c>] fallback_alloc+0x16b/0x1e1
[198937.504359]  [<ffffffff80295a59>] kmem_cache_alloc_node+0x105/0x138
[198937.504359]  [<ffffffff803b6771>] __alloc_skb+0x3c/0x12d
[198937.504359]  [<ffffffff803b7707>] __netdev_alloc_skb+0x29/0x43
[198937.504359]  [<ffffffffa00bd4ee>] :e1000e:e1000_alloc_rx_buffers+0x83/0x1db
[198937.504359]  [<ffffffffa00bd34f>] :e1000e:e1000_clean_rx_irq+0x29f/0x2da
[198937.504359]  [<ffffffffa00bbf74>] :e1000e:e1000_clean+0x88/0x270
[198937.504359]  [<ffffffff803bd297>] net_rx_action+0xab/0x1da
[198937.504359]  [<ffffffff802393a1>] __do_softirq+0x5c/0xd1
[198937.504359]  [<ffffffff8020d2dc>] call_softirq+0x1c/0x28
[198937.504359]  [<ffffffff8020f3e8>] do_softirq+0x3c/0x81
[198937.504359]  [<ffffffff802392ff>] irq_exit+0x3f/0x85
[198937.504359]  [<ffffffff8020f648>] do_IRQ+0xb9/0xd9
[198937.504359]  [<ffffffff80212b57>] mwait_idle+0x0/0x4d
[198937.504359]  [<ffffffff8020c47d>] ret_from_intr+0x0/0x19
[198937.504359]  <EOI>  [<ffffffff8021a857>] lapic_next_event+0x0/0x13
[198937.504359]  [<ffffffff80212b98>] mwait_idle+0x41/0x4d
[198937.504359]  [<ffffffff8020ad04>] cpu_idle+0x8e/0xb8
[198937.504359]
[198937.504359] Mem-info:
[198937.504359] Node 0 DMA per-cpu:
[198937.504359] CPU    0: hi:    0, btch:   1 usd:   0
[198937.504359] CPU    1: hi:    0, btch:   1 usd:   0
[198937.504359] CPU    2: hi:    0, btch:   1 usd:   0
[198937.504359] CPU    3: hi:    0, btch:   1 usd:   0
[198937.504359] Node 0 DMA32 per-cpu:
[198937.504359] CPU    0: hi:  186, btch:  31 usd:  63
[198937.504359] CPU    1: hi:  186, btch:  31 usd: 129
[198937.504359] CPU    2: hi:  186, btch:  31 usd:  40
[198937.504359] CPU    3: hi:  186, btch:  31 usd:  49
[198937.504359] Node 0 Normal per-cpu:
[198937.504359] CPU    0: hi:  186, btch:  31 usd:  63
[198937.504359] CPU    1: hi:  186, btch:  31 usd: 142
[198937.504359] CPU    2: hi:  186, btch:  31 usd:  99
[198937.504359] CPU    3: hi:  186, btch:  31 usd:  62
[198937.504359] Active:134041 inactive:3826671 dirty:20050 writeback:2 unstable:0
[198937.504359]  free:17166 slab:124810 mapped:3927 pagetables:815 bounce:0
[198937.504359] Node 0 DMA free:11652kB min:8kB low:8kB high:12kB active:0kB inactive:0kB present:10720kB pages_scanned:0 all_unreclaimable? yes
[198937.504359] lowmem_reserve[]: 0 3255 16133 16133
[198937.504359] Node 0 DMA32 free:52600kB min:3276kB low:4092kB high:4912kB active:16220kB inactive:2819100kB present:3333664kB pages_scanned:0 all_unreclaimable? no
[198937.504359] lowmem_reserve[]: 0 0 12877 12877
[198937.504359] Node 0 Normal free:4412kB min:12968kB low:16208kB high:19452kB active:519944kB inactive:12487584kB present:13186560kB pages_scanned:0 all_unreclaimable? no
[198937.504359] lowmem_reserve[]: 0 0 0 0
[198937.504359] Node 0 DMA: 3*4kB 5*8kB 3*16kB 3*32kB 3*64kB 0*128kB 2*256kB 1*512kB 2*1024kB 0*2048kB 2*4096kB = 11652kB
[198937.504359] Node 0 DMA32: 74*4kB 32*8kB 33*16kB 307*32kB 161*64kB 151*128kB 23*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 52568kB
[198937.504359] Node 0 Normal: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 4360kB
[198937.504359] 3849584 total pagecache pages
[198937.504359] Swap cache: add 366679, delete 356085, find 103268/139352
[198937.504359] Free swap  = 3746948kB
[198937.504359] Total swap = 4096564kB
[198937.504359] 4194304 pages of RAM
[198937.504359] 75804 reserved pages
[198937.504359] 3821940 pages shared
[198937.504359] 10594 pages swap cached


swapper: page allocation failure sounds to me like the system is trying to swap and that fails. I had 4Gb RAM and tried upgrading it to 16Gb (to see if there was a RAM shortage) but to no avail. I keep getting these messages.

It seems like it can handle a bunch of these messages before the system really hangs.

I tried upgrading bacula from version 3.0.2 to 5.0.2 (thought maybe of a memory leak) also to no avail.

Ran dbcheck (no errors), ran mysql repair table (no errors).

I noticed the File database was kinda getting large (~4GB) so I reconfigured mysql with the my-huge.cnf config also to no avail.

Is there anything i'm overlooking?

What else could be the cause of this behaviour?

I'm using debian lenny 2.6.26-2-amd64 kernel with bacula 5.0.2 (from the backports repository)
I'm backing up ~50 clients daily incremental and weekly full.

Kind regards,

Caspar Smit
Systemengineer
True Bit Resources B.V.
Ampèrestraat 13E
1446 TP  Purmerend

T: +31(0)299 410 475
F: +31(0)299 410 476
@: c.smit AT truebit DOT nl
W: www.truebit.nl

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>