[Bacula-users] System crashes, call traces when backing up using bacula.
2010-12-23 08:06:20
Hi all,
I have a system running bacula backup for about a year with no problems.
The last two to three weeks i suddenly had occasional system crashes (complete hangs where I need to reset).
I checked the syslog for any clues and found this error popping up several times during backup:
[198937.504359] swapper: page allocation failure. order:0, mode:0x20 [198937.504359] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1 [198937.504359] [198937.504359] Call Trace: [198937.504359] <IRQ> [<ffffffff80276c15>] __alloc_pages_internal+0x3a6/0x3bf
[198937.504359] [<ffffffff802957ec>] kmem_getpages+0x96/0x15f [198937.504359] [<ffffffff80295e7c>] fallback_alloc+0x16b/0x1e1 [198937.504359] [<ffffffff80295a59>] kmem_cache_alloc_node+0x105/0x138
[198937.504359] [<ffffffff803b6771>] __alloc_skb+0x3c/0x12d [198937.504359] [<ffffffff803b7707>] __netdev_alloc_skb+0x29/0x43 [198937.504359] [<ffffffffa00bd4ee>] :e1000e:e1000_alloc_rx_buffers+0x83/0x1db
[198937.504359] [<ffffffffa00bd34f>] :e1000e:e1000_clean_rx_irq+0x29f/0x2da [198937.504359] [<ffffffffa00bbf74>] :e1000e:e1000_clean+0x88/0x270 [198937.504359] [<ffffffff803bd297>] net_rx_action+0xab/0x1da
[198937.504359] [<ffffffff802393a1>] __do_softirq+0x5c/0xd1 [198937.504359] [<ffffffff8020d2dc>] call_softirq+0x1c/0x28 [198937.504359] [<ffffffff8020f3e8>] do_softirq+0x3c/0x81 [198937.504359] [<ffffffff802392ff>] irq_exit+0x3f/0x85
[198937.504359] [<ffffffff8020f648>] do_IRQ+0xb9/0xd9 [198937.504359] [<ffffffff80212b57>] mwait_idle+0x0/0x4d [198937.504359] [<ffffffff8020c47d>] ret_from_intr+0x0/0x19 [198937.504359] <EOI> [<ffffffff8021a857>] lapic_next_event+0x0/0x13
[198937.504359] [<ffffffff80212b98>] mwait_idle+0x41/0x4d [198937.504359] [<ffffffff8020ad04>] cpu_idle+0x8e/0xb8 [198937.504359] [198937.504359] Mem-info: [198937.504359] Node 0 DMA per-cpu:
[198937.504359] CPU 0: hi: 0, btch: 1 usd: 0 [198937.504359] CPU 1: hi: 0, btch: 1 usd: 0 [198937.504359] CPU 2: hi: 0, btch: 1 usd: 0 [198937.504359] CPU 3: hi: 0, btch: 1 usd: 0
[198937.504359] Node 0 DMA32 per-cpu: [198937.504359] CPU 0: hi: 186, btch: 31 usd: 63 [198937.504359] CPU 1: hi: 186, btch: 31 usd: 129 [198937.504359] CPU 2: hi: 186, btch: 31 usd: 40 [198937.504359] CPU 3: hi: 186, btch: 31 usd: 49
[198937.504359] Node 0 Normal per-cpu: [198937.504359] CPU 0: hi: 186, btch: 31 usd: 63 [198937.504359] CPU 1: hi: 186, btch: 31 usd: 142 [198937.504359] CPU 2: hi: 186, btch: 31 usd: 99 [198937.504359] CPU 3: hi: 186, btch: 31 usd: 62
[198937.504359] Active:134041 inactive:3826671 dirty:20050 writeback:2 unstable:0 [198937.504359] free:17166 slab:124810 mapped:3927 pagetables:815 bounce:0 [198937.504359] Node 0 DMA free:11652kB min:8kB low:8kB high:12kB active:0kB inactive:0kB present:10720kB pages_scanned:0 all_unreclaimable? yes
[198937.504359] lowmem_reserve[]: 0 3255 16133 16133 [198937.504359] Node 0 DMA32 free:52600kB min:3276kB low:4092kB high:4912kB active:16220kB inactive:2819100kB present:3333664kB pages_scanned:0 all_unreclaimable? no
[198937.504359] lowmem_reserve[]: 0 0 12877 12877 [198937.504359] Node 0 Normal free:4412kB min:12968kB low:16208kB high:19452kB active:519944kB inactive:12487584kB present:13186560kB pages_scanned:0 all_unreclaimable? no
[198937.504359] lowmem_reserve[]: 0 0 0 0 [198937.504359] Node 0 DMA: 3*4kB 5*8kB 3*16kB 3*32kB 3*64kB 0*128kB 2*256kB 1*512kB 2*1024kB 0*2048kB 2*4096kB = 11652kB [198937.504359] Node 0 DMA32: 74*4kB 32*8kB 33*16kB 307*32kB 161*64kB 151*128kB 23*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 52568kB
[198937.504359] Node 0 Normal: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 4360kB [198937.504359] 3849584 total pagecache pages [198937.504359] Swap cache: add 366679, delete 356085, find 103268/139352
[198937.504359] Free swap = 3746948kB [198937.504359] Total swap = 4096564kB [198937.504359] 4194304 pages of RAM [198937.504359] 75804 reserved pages [198937.504359] 3821940 pages shared [198937.504359] 10594 pages swap cached
swapper: page allocation failure sounds to me like the system is trying to swap and that fails. I had 4Gb RAM and tried upgrading it to 16Gb (to see if there was a RAM shortage) but to no avail. I keep getting these messages.
It seems like it can handle a bunch of these messages before the system really hangs.
I tried upgrading bacula from version 3.0.2 to 5.0.2 (thought maybe of a memory leak) also to no avail.
Ran dbcheck (no errors), ran mysql repair table (no errors).
I noticed the File database was kinda getting large (~4GB) so I reconfigured mysql with the my-huge.cnf config also to no avail.
Is there anything i'm overlooking?
What else could be the cause of this behaviour?
I'm using debian lenny 2.6.26-2-amd64 kernel with bacula 5.0.2 (from the backports repository) I'm backing up ~50 clients daily incremental and weekly full.
Kind regards,
Caspar Smit
Systemengineer True Bit Resources B.V. Ampèrestraat 13E 1446 TP Purmerend
T: +31(0)299 410 475 F: +31(0)299 410 476 @: c.smit AT truebit DOT nl W: www.truebit.nl
------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- [Bacula-users] System crashes, call traces when backing up using bacula.,
Caspar Smit <=
|
|
|