--On Saturday, June 04, 2005 16:08:32 +0200 Thorsten Jungeblut <tj AT hni.upb
DOT de> wrote:
> Hi,
>
> since a few weeks, my amdump fails unreproducible.
> After that, sometimes a few zombie-processes remain. eg. gzip or dumper.
> After that, every command i issue (amstatus, amcheck, ...), hangs and keeps
> in "uninterruptible sleep".
> Then system load stays at a very high load (7 or higher) although, actually
> doing nothing.
> The only way to clean up the system, is to reboot, sometimes only hard-reset.
I'm guessing nobody replied because it really isn't an Amanda question. You
are getting a kernel
error while trying to update the journal on some filesystem (ext3 uses a
journal, for example).
Since the system is unable to complete the write, all processes trying to
access the filesystem
will hang in a 'D' state until the write completes (and probably never will).
The problem may be related to the dm-crypt module, it could be the actual disk,
bad RAM, a bug
in the particular kernel version you're running, or a bad disk controller chip.
Some things I would try (in this order) to see if it goes away:
1. Run memtest86+ on the machine for at least one pass.
2. Update your kernel (2.6.11 is the stable version, or try 2.6.12rc5 if you
like the latest)
2. Convert your encrypted filesystem to a plain one.
Frank
>
> I suppose, it has something to do with the filesystem (crypted, using
> dm-crypt - don't know, if its important):
> Every time amdump fails, i get the following error in /var/spool/messages:
>
> Jun 4 15:15:36 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev
> i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async
> crc_c
> citt ppp_generic slhc dm_mod
> Jun 4 15:15:36 little kernel: CPU: 0
> Jun 4 15:15:36 little kernel: EIP: 0060:[<c01ba74d>] Not tainted VLI
> Jun 4 15:15:36 little kernel: EFLAGS: 00010286 (2.6.11.11)
> Jun 4 15:15:36 little kernel: EIP is at
> journal_commit_transaction+0x1cd/0xf00
> Jun 4 15:15:36 little kernel: eax: 81910fdd ebx: 922cb828 ecx: 00000000
> edx: ec272000
> Jun 4 15:15:36 little kernel: esi: e56c24e0 edi: cd8217ac ebp: 0000000d
> esp: ec273de4
> Jun 4 15:15:36 little kernel: ds: 007b es: 007b ss: 0068
> Jun 4 15:15:36 little kernel: Process kjournald (pid: 881,
> threadinfo=ec272000 task=eda82560)
> Jun 4 15:15:36 little kernel: Stack: ec273e5c 00000040 ec273e5c 00001130
> d013bf9c ec272000 ec272000 00000000
> Jun 4 15:15:36 little kernel: 00000000 00000000 00000000 cd57523c
> cd57544c 00001130 00000000 eda82560
> Jun 4 15:15:36 little kernel: c0123ce0 ec273e48 ec273e48 00000001
> 00000086 00000001 00000000 eda82560
> Jun 4 15:15:36 little kernel: Call Trace:
> Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun 4 15:15:36 little kernel: [<c01bd4a1>] kjournald+0xc1/0x1f0
> Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun 4 15:15:36 little kernel: [<c01022be>] ret_from_fork+0x6/0x14
> Jun 4 15:15:36 little kernel: [<c01bd3c0>] commit_timeout+0x0/0x10
> Jun 4 15:15:36 little kernel: [<c01bd3e0>] kjournald+0x0/0x1f0
> Jun 4 15:15:36 little kernel: [<c01006b1>] kernel_thread_helper+0x5/0x14
> Jun 4 15:15:36 little kernel: Code: c7 44 24 28 00 00 00 00 31 ed e8 af 25
> 15 00 8b 46 20 85 c0 74 64 ba 00 e0 ff ff 21 e2 89 54 24 14 89 c7 8b 40 1c 89
> 46 20
> 8b 1f <8b> 03 a8 04 0f 84 91 0b 00 00 8b 84 24 8c 01 00 00 89 5c 24 04
> Jun 4 15:18:08 little kernel: <1>Unable to handle kernel paging request at
> virtual address 6b4a6d16
> Jun 4 15:18:08 little kernel: c01ba26f
> Jun 4 15:18:08 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev
> i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async
> crc_c
> citt ppp_generic slhc dm_mod
> Jun 4 15:18:08 little kernel: CPU: 0
> Jun 4 15:18:08 little kernel: EIP: 0060:[<c01ba26f>] Not tainted VLI
> Jun 4 15:18:08 little kernel: EFLAGS: 00010286 (2.6.11.11)
> Jun 4 15:18:08 little kernel: EIP is at __journal_file_buffer+0x13f/0x230
> Jun 4 15:18:08 little kernel: eax: 6b4a6cfa ebx: d299717c ecx: 00000000
> edx: cd8217ac
> Jun 4 15:18:08 little kernel: esi: 00000001 edi: dd9a3780 ebp: c2ee00bc
> esp: dddefc6c
> Jun 4 15:18:08 little kernel: ds: 007b es: 007b ss: 0068
> Jun 4 15:18:08 little kernel: Process dumper (pid: 3759, threadinfo=dddee000
> task=d1331540)
> Jun 4 15:18:08 little kernel: Stack: 00000000 c01bf10e 00001000 00000000
> c11bd880 00000000 dd9a3780 c17de6c0
> Jun 4 15:18:08 little kernel: edd852b8 c2ee00bc d299717c c01b94fe
> d299717c dd9a3760 00000001 00000001
> Jun 4 15:18:08 little kernel: db7f7cb0 00000000 00001000 edd852b8
> c2ee00bc 00001000 c01a9b23 edd852b8
> Jun 4 15:18:08 little kernel: Call Trace:
> Jun 4 15:18:08 little kernel: [<c01bf10e>]
> journal_add_journal_head+0xae/0xc0
> Jun 4 15:18:08 little kernel: [<c01b94fe>] journal_dirty_data+0xee/0x160
> Jun 4 15:18:08 little kernel: [<c01a9b23>] ext3_journal_dirty_data+0x23/0x70
> Jun 4 15:18:08 little kernel: [<c01a9938>] walk_page_buffers+0x68/0x70
> Jun 4 15:18:08 little kernel: [<c01a9c51>]
> ext3_ordered_commit_write+0x61/0xf0
> Jun 4 15:18:08 little kernel: [<c01a9b00>] ext3_journal_dirty_data+0x0/0x70
> Jun 4 15:18:08 little kernel: [<c012c149>]
> generic_file_buffered_write+0x229/0x5f0
> Jun 4 15:18:08 little kernel: [<c015eb82>] inode_update_time+0x52/0xe0
> Jun 4 15:18:08 little kernel: [<c012c7dd>]
> __generic_file_aio_write_nolock+0x2cd/0x500
> Jun 4 15:18:08 little kernel: [<c02929ea>] sock_common_recvmsg+0x5a/0x80
> Jun 4 15:18:08 little kernel: [<c028f525>] sock_aio_read+0xf5/0x110
> Jun 4 15:18:08 little kernel: [<c012ccc2>] generic_file_aio_write+0x72/0xe0
> Jun 4 15:18:08 little kernel: [<c01a73b4>] ext3_file_write+0x44/0xd0
> Jun 4 15:18:08 little kernel: [<c014669e>] do_sync_write+0xbe/0xf0
> Jun 4 15:18:08 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun 4 15:18:08 little kernel: [<c0158264>] sys_select+0x234/0x4d0
> Jun 4 15:18:08 little kernel: [<c014676f>] vfs_write+0x9f/0x120
> Jun 4 15:18:08 little kernel: [<c01468c1>] sys_write+0x51/0x80
> Jun 4 15:18:08 little kernel: [<c01023af>] syscall_call+0x7/0xb
> Jun 4 15:18:08 little kernel: Code: 21 89 5b 20 89 5b 1c 89 18 89 73 08 8b
> 44 24 14 85 c0 0f 84 64 ff ff ff 0f ba 6d 00 12 e9 5a ff ff ff 8b 42 20 89 53
> 1c 89
> 43 20 <89> 58 1c 89 5a 20 eb d6 ff 47 10 83 c7 1c eb b8 83 c7 24 eb b3
> Jun 4 15:18:08 little kernel: <1>Unable to handle kernel paging request at
> virtual address b67ee005
> Jun 4 15:18:08 little kernel: c01ba26f
> Jun 4 15:18:08 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev
> i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async
> crc_c
> citt ppp_generic slhc dm_mod
> Jun 4 15:18:08 little kernel: CPU: 0
> Jun 4 15:18:08 little kernel: EIP: 0060:[<c01ba26f>] Not tainted VLI
> Jun 4 15:18:08 little kernel: EFLAGS: 00010286 (2.6.11.11)
> Jun 4 15:18:08 little kernel: EIP is at __journal_file_buffer+0x13f/0x230
> Jun 4 15:18:08 little kernel: eax: b67edfe9 ebx: d299714c ecx: 00000000
> edx: cd8217ac
> Jun 4 15:18:08 little kernel: esi: 00000001 edi: dd9a3780 ebp: cc90323c
> esp: e32cdc6c
> Jun 4 15:18:08 little kernel: ds: 007b es: 007b ss: 0068
> Jun 4 15:18:08 little kernel: Process driver (pid: 3753, threadinfo=e32cc000
> task=dfd8d060)
> Jun 4 15:18:08 little kernel: Stack: c156e1e0 c01bf10e c01a9494 edd852a4
> cc2c6a54 00000000 dd9a3780 c17de6c0
> Jun 4 15:18:08 little kernel: edd852a4 cc90323c d299714c c01b94fe
> d299714c dd9a3760 00000001 00000001
> Jun 4 15:18:08 little kernel: 00001000 00000000 00001000 edd852a4
> cc90323c 00001000 c01a9b23 edd852a4
> Jun 4 15:18:08 little kernel: Call Trace:
> Jun 4 15:18:08 little kernel: [<c01bf10e>]
> journal_add_journal_head+0xae/0xc0
> Jun 4 15:18:08 little kernel: [<c01a9494>] ext3_get_block+0x54/0xa0
> Jun 4 15:18:08 little kernel: [<c01b94fe>] journal_dirty_data+0xee/0x160
> Jun 4 15:18:08 little kernel: [<c01a9b23>] ext3_journal_dirty_data+0x23/0x70
> Jun 4 15:18:08 little kernel: [<c01a9938>] walk_page_buffers+0x68/0x70
> Jun 4 15:18:08 little kernel: [<c01a9c51>]
> ext3_ordered_commit_write+0x61/0xf0
> Jun 4 15:18:08 little kernel: [<c01a9b00>] ext3_journal_dirty_data+0x0/0x70
> Jun 4 15:18:08 little kernel: [<c012c149>]
> generic_file_buffered_write+0x229/0x5f0
> Jun 4 15:18:08 little kernel: [<c015ebe3>] inode_update_time+0xb3/0xe0
> Jun 4 15:18:08 little kernel: [<c012c7dd>]
> __generic_file_aio_write_nolock+0x2cd/0x500
> Jun 4 15:18:08 little kernel: [<c012af5e>]
> __generic_file_aio_read+0x1be/0x1f0
> Jun 4 15:18:08 little kernel: [<c012ccc2>] generic_file_aio_write+0x72/0xe0
> Jun 4 15:18:08 little kernel: [<c0152e90>] do_lookup+0x30/0xb0
> Jun 4 15:18:08 little kernel: [<c01a73b4>] ext3_file_write+0x44/0xd0
> Jun 4 15:18:08 little kernel: [<c014669e>] do_sync_write+0xbe/0xf0
> Jun 4 15:18:08 little kernel: [<c01540f9>] may_open+0x59/0x1e0
> Jun 4 15:18:08 little kernel: [<c0154325>] open_namei+0xa5/0x5c0
> Jun 4 15:18:08 little kernel: [<c0145a8e>] dentry_open+0xce/0x180
> Jun 4 15:18:08 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60
> Jun 4 15:18:08 little kernel: [<c014676f>] vfs_write+0x9f/0x120
> Jun 4 15:18:08 little kernel: [<c01468c1>] sys_write+0x51/0x80
> Jun 4 15:18:08 little kernel: [<c01023af>] syscall_call+0x7/0xb
> Jun 4 15:18:08 little kernel: Code: 21 89 5b 20 89 5b 1c 89 18 89 73 08 8b
> 44 24 14 85 c0 0f 84 64 ff ff ff 0f ba 6d 00 12 e9 5a ff ff ff 8b 42 20 89 53
> 1c 89
> 43 20 <89> 58 1c 89 5a 20 eb d6 ff 47 10 83 c7 1c eb b8 83 c7 24 eb b3
>
>
>
> I'm using Debian-testing,
>
> little:~# uname -a
> Linux little 2.6.11.11 #1 Fri Jun 3 13:25:57 CEST 2005 i686 GNU/Linux
>
> build: VERSION="Amanda-2.4.4p3"
> BUILT_DATE="Wed Aug 18 13:06:52 MDT 2004"
> BUILT_MACH="Linux rover 2.6.7 #1 Fri Jul 23 21:53:49 MDT 2004 i686
> GNU/Linux
>
>
>
> Does anyone has an idea, whats going wrong here?
>
> Tnx for help
> Thorsten
--
Frank Smith fsmith AT hoovers
DOT com
Sr. Systems Administrator Voice: 512-374-4673
Hoover's Online Fax: 512-374-4501
|