Amanda-Users

Re: amdump fails, kjournal

2005-06-04 23:45:19
Subject: Re: amdump fails, kjournal
From: Frank Smith <fsmith AT hoovers DOT com>
To: Thorsten Jungeblut <tj AT hni.upb DOT de>, amanda-users AT amanda DOT org
Date: Sat, 04 Jun 2005 22:21:16 -0500
--On Saturday, June 04, 2005 16:08:32 +0200 Thorsten Jungeblut <tj AT hni.upb 
DOT de> wrote:

> Hi,
> 
> since a few weeks, my amdump fails unreproducible.
> After that, sometimes a few zombie-processes remain. eg. gzip or dumper.
> After that, every command i issue (amstatus, amcheck, ...), hangs and keeps 
> in "uninterruptible sleep".
> Then system load stays at a very high load (7 or higher) although, actually 
> doing nothing.
> The only way to clean up the system, is to reboot, sometimes only hard-reset.

I'm guessing nobody replied because it really isn't an Amanda question.  You 
are getting a kernel
error while trying to update the journal on some filesystem (ext3 uses a 
journal, for example).
Since the system is unable to complete the write, all processes trying to 
access the filesystem
will hang in a 'D' state until the write completes (and probably never will).

The problem may be related to the dm-crypt module, it could be the actual disk, 
bad RAM, a bug
in the particular kernel version you're running, or a bad disk controller chip.

Some things I would try (in this order) to see if it goes away:
1. Run memtest86+ on the machine for at least one pass.
2. Update your kernel (2.6.11 is the stable version, or try 2.6.12rc5 if you 
like the latest)
2. Convert your encrypted filesystem to a plain one.

Frank

> 
> I suppose, it has something to do with the filesystem (crypted, using 
> dm-crypt - don't know, if its important):
> Every time amdump fails, i get  the following error in /var/spool/messages:
> 
> Jun  4 15:15:36 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev 
> i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async 
> crc_c
> citt ppp_generic slhc dm_mod
> Jun  4 15:15:36 little kernel: CPU:    0
> Jun  4 15:15:36 little kernel: EIP:    0060:[<c01ba74d>]    Not tainted VLI
> Jun  4 15:15:36 little kernel: EFLAGS: 00010286   (2.6.11.11)
> Jun  4 15:15:36 little kernel: EIP is at 
> journal_commit_transaction+0x1cd/0xf00
> Jun  4 15:15:36 little kernel: eax: 81910fdd   ebx: 922cb828   ecx: 00000000  
>  edx: ec272000
> Jun  4 15:15:36 little kernel: esi: e56c24e0   edi: cd8217ac   ebp: 0000000d  
>  esp: ec273de4
> Jun  4 15:15:36 little kernel: ds: 007b   es: 007b   ss: 0068
> Jun  4 15:15:36 little kernel: Process kjournald (pid: 881, 
> threadinfo=ec272000 task=eda82560)
> Jun  4 15:15:36 little kernel: Stack: ec273e5c 00000040 ec273e5c 00001130 
> d013bf9c ec272000 ec272000 00000000
> Jun  4 15:15:36 little kernel:        00000000 00000000 00000000 cd57523c 
> cd57544c 00001130 00000000 eda82560
> Jun  4 15:15:36 little kernel:        c0123ce0 ec273e48 ec273e48 00000001 
> 00000086 00000001 00000000 eda82560
> Jun  4 15:15:36 little kernel: Call Trace:
> Jun  4 15:15:36 little kernel:  [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun  4 15:15:36 little kernel:  [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun  4 15:15:36 little kernel:  [<c01bd4a1>] kjournald+0xc1/0x1f0
> Jun  4 15:15:36 little kernel:  [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun  4 15:15:36 little kernel:  [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun  4 15:15:36 little kernel:  [<c01022be>] ret_from_fork+0x6/0x14
> Jun  4 15:15:36 little kernel:  [<c01bd3c0>] commit_timeout+0x0/0x10
> Jun  4 15:15:36 little kernel:  [<c01bd3e0>] kjournald+0x0/0x1f0
> Jun  4 15:15:36 little kernel:  [<c01006b1>] kernel_thread_helper+0x5/0x14
> Jun  4 15:15:36 little kernel: Code: c7 44 24 28 00 00 00 00 31 ed e8 af 25 
> 15 00 8b 46 20 85 c0 74 64 ba 00 e0 ff ff 21 e2 89 54 24 14 89 c7 8b 40 1c 89 
> 46 20
> 8b 1f <8b> 03 a8 04 0f 84 91 0b 00 00 8b 84 24 8c 01 00 00 89 5c 24 04
> Jun  4 15:18:08 little kernel:  <1>Unable to handle kernel paging request at 
> virtual address 6b4a6d16
> Jun  4 15:18:08 little kernel: c01ba26f
> Jun  4 15:18:08 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev 
> i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async 
> crc_c
> citt ppp_generic slhc dm_mod
> Jun  4 15:18:08 little kernel: CPU:    0
> Jun  4 15:18:08 little kernel: EIP:    0060:[<c01ba26f>]    Not tainted VLI
> Jun  4 15:18:08 little kernel: EFLAGS: 00010286   (2.6.11.11)
> Jun  4 15:18:08 little kernel: EIP is at __journal_file_buffer+0x13f/0x230
> Jun  4 15:18:08 little kernel: eax: 6b4a6cfa   ebx: d299717c   ecx: 00000000  
>  edx: cd8217ac
> Jun  4 15:18:08 little kernel: esi: 00000001   edi: dd9a3780   ebp: c2ee00bc  
>  esp: dddefc6c
> Jun  4 15:18:08 little kernel: ds: 007b   es: 007b   ss: 0068
> Jun  4 15:18:08 little kernel: Process dumper (pid: 3759, threadinfo=dddee000 
> task=d1331540)
> Jun  4 15:18:08 little kernel: Stack: 00000000 c01bf10e 00001000 00000000 
> c11bd880 00000000 dd9a3780 c17de6c0
> Jun  4 15:18:08 little kernel:        edd852b8 c2ee00bc d299717c c01b94fe 
> d299717c dd9a3760 00000001 00000001
> Jun  4 15:18:08 little kernel:        db7f7cb0 00000000 00001000 edd852b8 
> c2ee00bc 00001000 c01a9b23 edd852b8
> Jun  4 15:18:08 little kernel: Call Trace:
> Jun  4 15:18:08 little kernel:  [<c01bf10e>] 
> journal_add_journal_head+0xae/0xc0
> Jun  4 15:18:08 little kernel:  [<c01b94fe>] journal_dirty_data+0xee/0x160
> Jun  4 15:18:08 little kernel:  [<c01a9b23>] ext3_journal_dirty_data+0x23/0x70
> Jun  4 15:18:08 little kernel:  [<c01a9938>] walk_page_buffers+0x68/0x70
> Jun  4 15:18:08 little kernel:  [<c01a9c51>] 
> ext3_ordered_commit_write+0x61/0xf0
> Jun  4 15:18:08 little kernel:  [<c01a9b00>] ext3_journal_dirty_data+0x0/0x70
> Jun  4 15:18:08 little kernel:  [<c012c149>] 
> generic_file_buffered_write+0x229/0x5f0
> Jun  4 15:18:08 little kernel:  [<c015eb82>] inode_update_time+0x52/0xe0
> Jun  4 15:18:08 little kernel:  [<c012c7dd>] 
> __generic_file_aio_write_nolock+0x2cd/0x500
> Jun  4 15:18:08 little kernel:  [<c02929ea>] sock_common_recvmsg+0x5a/0x80
> Jun  4 15:18:08 little kernel:  [<c028f525>] sock_aio_read+0xf5/0x110
> Jun  4 15:18:08 little kernel:  [<c012ccc2>] generic_file_aio_write+0x72/0xe0
> Jun  4 15:18:08 little kernel:  [<c01a73b4>] ext3_file_write+0x44/0xd0
> Jun  4 15:18:08 little kernel:  [<c014669e>] do_sync_write+0xbe/0xf0
> Jun  4 15:18:08 little kernel:  [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun  4 15:18:08 little kernel:  [<c0158264>] sys_select+0x234/0x4d0
> Jun  4 15:18:08 little kernel:  [<c014676f>] vfs_write+0x9f/0x120
> Jun  4 15:18:08 little kernel:  [<c01468c1>] sys_write+0x51/0x80
> Jun  4 15:18:08 little kernel:  [<c01023af>] syscall_call+0x7/0xb
> Jun  4 15:18:08 little kernel: Code: 21 89 5b 20 89 5b 1c 89 18 89 73 08 8b 
> 44 24 14 85 c0 0f 84 64 ff ff ff 0f ba 6d 00 12 e9 5a ff ff ff 8b 42 20 89 53 
> 1c 89
> 43 20 <89> 58 1c 89 5a 20 eb d6 ff 47 10 83 c7 1c eb b8 83 c7 24 eb b3
> Jun  4 15:18:08 little kernel:  <1>Unable to handle kernel paging request at 
> virtual address b67ee005
> Jun  4 15:18:08 little kernel: c01ba26f
> Jun  4 15:18:08 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev 
> i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async 
> crc_c
> citt ppp_generic slhc dm_mod
> Jun  4 15:18:08 little kernel: CPU:    0
> Jun  4 15:18:08 little kernel: EIP:    0060:[<c01ba26f>]    Not tainted VLI
> Jun  4 15:18:08 little kernel: EFLAGS: 00010286   (2.6.11.11)
> Jun  4 15:18:08 little kernel: EIP is at __journal_file_buffer+0x13f/0x230
> Jun  4 15:18:08 little kernel: eax: b67edfe9   ebx: d299714c   ecx: 00000000  
>  edx: cd8217ac
> Jun  4 15:18:08 little kernel: esi: 00000001   edi: dd9a3780   ebp: cc90323c  
>  esp: e32cdc6c
> Jun  4 15:18:08 little kernel: ds: 007b   es: 007b   ss: 0068
> Jun  4 15:18:08 little kernel: Process driver (pid: 3753, threadinfo=e32cc000 
> task=dfd8d060)
> Jun  4 15:18:08 little kernel: Stack: c156e1e0 c01bf10e c01a9494 edd852a4 
> cc2c6a54 00000000 dd9a3780 c17de6c0
> Jun  4 15:18:08 little kernel:        edd852a4 cc90323c d299714c c01b94fe 
> d299714c dd9a3760 00000001 00000001
> Jun  4 15:18:08 little kernel:        00001000 00000000 00001000 edd852a4 
> cc90323c 00001000 c01a9b23 edd852a4
> Jun  4 15:18:08 little kernel: Call Trace:
> Jun  4 15:18:08 little kernel:  [<c01bf10e>] 
> journal_add_journal_head+0xae/0xc0
> Jun  4 15:18:08 little kernel:  [<c01a9494>] ext3_get_block+0x54/0xa0
> Jun  4 15:18:08 little kernel:  [<c01b94fe>] journal_dirty_data+0xee/0x160
> Jun  4 15:18:08 little kernel:  [<c01a9b23>] ext3_journal_dirty_data+0x23/0x70
> Jun  4 15:18:08 little kernel:  [<c01a9938>] walk_page_buffers+0x68/0x70
> Jun  4 15:18:08 little kernel:  [<c01a9c51>] 
> ext3_ordered_commit_write+0x61/0xf0
> Jun  4 15:18:08 little kernel:  [<c01a9b00>] ext3_journal_dirty_data+0x0/0x70
> Jun  4 15:18:08 little kernel:  [<c012c149>] 
> generic_file_buffered_write+0x229/0x5f0
> Jun  4 15:18:08 little kernel:  [<c015ebe3>] inode_update_time+0xb3/0xe0
> Jun  4 15:18:08 little kernel:  [<c012c7dd>] 
> __generic_file_aio_write_nolock+0x2cd/0x500
> Jun  4 15:18:08 little kernel:  [<c012af5e>] 
> __generic_file_aio_read+0x1be/0x1f0
> Jun  4 15:18:08 little kernel:  [<c012ccc2>] generic_file_aio_write+0x72/0xe0
> Jun  4 15:18:08 little kernel:  [<c0152e90>] do_lookup+0x30/0xb0
> Jun  4 15:18:08 little kernel:  [<c01a73b4>] ext3_file_write+0x44/0xd0
> Jun  4 15:18:08 little kernel:  [<c014669e>] do_sync_write+0xbe/0xf0
> Jun  4 15:18:08 little kernel:  [<c01540f9>] may_open+0x59/0x1e0
> Jun  4 15:18:08 little kernel:  [<c0154325>] open_namei+0xa5/0x5c0
> Jun  4 15:18:08 little kernel:  [<c0145a8e>] dentry_open+0xce/0x180
> Jun  4 15:18:08 little kernel:  [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun  4 15:18:08 little kernel:  [<c014676f>] vfs_write+0x9f/0x120
> Jun  4 15:18:08 little kernel:  [<c01468c1>] sys_write+0x51/0x80
> Jun  4 15:18:08 little kernel:  [<c01023af>] syscall_call+0x7/0xb
> Jun  4 15:18:08 little kernel: Code: 21 89 5b 20 89 5b 1c 89 18 89 73 08 8b 
> 44 24 14 85 c0 0f 84 64 ff ff ff 0f ba 6d 00 12 e9 5a ff ff ff 8b 42 20 89 53 
> 1c 89
> 43 20 <89> 58 1c 89 5a 20 eb d6 ff 47 10 83 c7 1c eb b8 83 c7 24 eb b3
> 
> 
> 
> I'm using Debian-testing,
> 
> little:~# uname -a
> Linux little 2.6.11.11 #1 Fri Jun 3 13:25:57 CEST 2005 i686 GNU/Linux
> 
> build: VERSION="Amanda-2.4.4p3"
>         BUILT_DATE="Wed Aug 18 13:06:52 MDT 2004"
>         BUILT_MACH="Linux rover 2.6.7 #1 Fri Jul 23 21:53:49 MDT 2004 i686 
> GNU/Linux
> 
> 
> 
> Does anyone has an idea, whats going wrong here?
> 
> Tnx for help
> Thorsten



--
Frank Smith                                                fsmith AT hoovers 
DOT com
Sr. Systems Administrator                                 Voice: 512-374-4673
Hoover's Online                                             Fax: 512-374-4501

<Prev in Thread] Current Thread [Next in Thread>