Bacula-users

[Bacula-users] bacula-sd segfaut

2010-09-02 05:32:06
Subject: [Bacula-users] bacula-sd segfaut
From: Baptiste Malguy <baptiste AT malguy DOT net>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 2 Sep 2010 11:02:47 +0200
Hello,

I have a blocking issue with bacula-sd daemon. Environment :
- Debian Lenny AMD64
- Kernel: 2.6.32-bpo.4-amd64
- Bacula version : 3.0.3 and 5.0.3
- We use TLS for authentication and transfers.

Every few days, bacula-sd quits with a segfault. I've setup the debugging stuff, so I finally have the backtrace. When I read it, I see nothing "showing" what caused the segfault.

Anyone who "read" better ?

It happens both when there are a couple of low I/O jobs or several high I/O jobs. By I/O, I mean disk and networ. Indeed, we backup on disk only. I could tell a lot about our setup which would be a lot of noise, so let me know what is actually interesting for the matter.

Regards,
[Thread debugging using libthread_db enabled]
[New Thread 0x7f8ef38f36f0 (LWP 18535)]
[New Thread 0x42900950 (LWP 12104)]
[New Thread 0x44904950 (LWP 26990)]
[New Thread 0x418fe950 (LWP 18539)]
0x00007f8ef0a89d52 in select () from /lib/libc.so.6
$1 = '\0' <repeats 29 times>
$2 = 0xa6a088 "bacula-sd"
$3 = 0xa6a0c8 "/usr/sbin/bacula-sd"
$4 = 0x0
$5 = 0x7f8ef2cc0c28 "5.0.3 (04 August 2010)"
$6 = 0x7f8ef2cc0c4c "x86_64-pc-linux-gnu"
$7 = 0x7f8ef2cc0c60 "debian"
$8 = 0x7f8ef2cc0c67 "5.0.5"
$9 = "backup2", '\0' <repeats 42 times>
$10 = 0x7f8ef2cc0c3f "debian 5.0.5"
$11 = 0
Environment variable "TestName" not defined.
#0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#1 0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
client_wq=0x6642c0,
handle_client_request=0x4278c6 <handle_connection_request(void*)>)
at bnet_server.c:161
#2 0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313

Thread 4 (Thread 0x418fe950 (LWP 18539)):
#0 0x00007f8ef20c1fad in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00007f8ef2cb6a3d in watchdog_thread (arg=0x0) at watchdog.c:321
#2 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#3 0x00007f8ef0a9064d in clone () from /lib/libc.so.6
#4 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x44904950 (LWP 26990)):
#0 0x00007f8ef2a5b44e in ?? () from /usr/lib/libz.so.1
#1 0x00007f8ef2a5a28d in deflate () from /usr/lib/libz.so.1
#2 0x00007f8ef17e294e in ?? () from /usr/lib/libcrypto.so.0.9.8
#3 0x00007f8ef17e25b2 in COMP_compress_block ()
from /usr/lib/libcrypto.so.0.9.8
#4 0x00007f8ef1a7e35e in ssl3_do_compress () from /usr/lib/libssl.so.0.9.8
#5 0x00007f8ef1a7e4ac in ?? () from /usr/lib/libssl.so.0.9.8
#6 0x00007f8ef1a7e9a0 in ssl3_write_bytes () from /usr/lib/libssl.so.0.9.8
#7 0x00007f8ef2cad1b0 in openssl_bsock_readwrite (bsock=0x107b568,
ptr=0xc8efdc "", nbytes=4, write=true) at tls.c:626
#8 0x00007f8ef2cad483 in tls_bsock_writen (bsock=0x107b568, ptr=0xc8efdc "",
nbytes=4) at tls.c:704
#9 0x00007f8ef2c84670 in write_nbytes (bsock=0x107b568, ptr=0xc8efdc "",
nbytes=4) at bnet.c:128
#10 0x00007f8ef2c880a4 in BSOCK::send (this=0x107b568) at bsock.c:379
#11 0x00007f8ef2c885e8 in BSOCK::signal (this=0x107b568, signal=-4)
at bsock.c:574
#12 0x0000000000428070 in handle_connection_request (arg=0x107b568)
at dircmd.c:251
#13 0x00007f8ef2cb7587 in workq_server (arg=0x6642c0) at workq.c:346
#14 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#15 0x00007f8ef0a9064d in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x42900950 (LWP 12104)):
#0 0x00007f8ef20c55ef in waitpid () from /lib/libpthread.so.0
#1 0x00007f8ef2cab0b7 in signal_handler (sig=11) at signal.c:229
#2 <signal handler called>
#3 0x00007f8ef2a5b0bf in ?? () from /usr/lib/libz.so.1
#4 0x00007f8ef2a5a28d in deflate () from /usr/lib/libz.so.1
#5 0x00007f8ef17e294e in ?? () from /usr/lib/libcrypto.so.0.9.8
#6 0x00007f8ef17e25b2 in COMP_compress_block ()
from /usr/lib/libcrypto.so.0.9.8
#7 0x00007f8ef1a7e35e in ssl3_do_compress () from /usr/lib/libssl.so.0.9.8
#8 0x00007f8ef1a7e4ac in ?? () from /usr/lib/libssl.so.0.9.8
#9 0x00007f8ef1a7e9a0 in ssl3_write_bytes () from /usr/lib/libssl.so.0.9.8
#10 0x00007f8ef2cad1b0 in openssl_bsock_readwrite (bsock=0x107b568,
ptr=0xc8efdc "", nbytes=182, write=true) at tls.c:626
#11 0x00007f8ef2cad483 in tls_bsock_writen (bsock=0x107b568, ptr=0xc8efdc "",
nbytes=182) at tls.c:704
#12 0x00007f8ef2c84670 in write_nbytes (bsock=0x107b568, ptr=0xc8efdc "",
nbytes=182) at bnet.c:128
#13 0x00007f8ef2c880a4 in BSOCK::send (this=0x107b568) at bsock.c:379
#14 0x00007f8ef2c887c7 in BSOCK::fsend (this=0x107b568,
fmt=0x7f8ef2cc05d0 "Jmsg Job=%s type=%d level=%lld %s") at bsock.c:434
#15 0x00007f8ef2c9c80f in dispatch_message (jcr=0xe6ba38, type=6,
mtime=1283388921,
msg=0x428fe870 "backup2-sd JobId 83624: JobId=83624 Job=\"ivr-db1-1-System.2010-09-02_00.00.01_26\" marked to be canceled.\n") at message.c:888
#16 0x00007f8ef2c9cf8e in Jmsg (jcr=0xe6ba38, type=6, mtime=0,
fmt=0x451860 "JobId=%d Job=\"%s\" marked to be canceled.\n")
at message.c:1292
#17 0x0000000000427763 in cancel_cmd (cjcr=0xc12248) at dircmd.c:335
#18 0x0000000000427f24 in handle_connection_request (arg=0x134a028)
at dircmd.c:233
#19 0x00007f8ef2cb7587 in workq_server (arg=0x6642c0) at workq.c:346
#20 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#21 0x00007f8ef0a9064d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f8ef38f36f0 (LWP 18535)):
#0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#1 0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
client_wq=0x6642c0,
handle_client_request=0x4278c6 <handle_connection_request(void*)>)
at bnet_server.c:161
#2 0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313
#0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6
No symbol table info available.
#1 0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
client_wq=0x6642c0,
handle_client_request=0x4278c6 <handle_connection_request(void*)>)
at bnet_server.c:161
161 bnet_server.c: No such file or directory.
in bnet_server.c
Current language: auto; currently c++
maxfd = 6
sockset = {fds_bits = {112, 0 <repeats 15 times>}}
newsockfd = 7
stat = 0
clilen = 16
cli_addr = {sa_family = 2,
sa_data = "®(\177\000\001\001\000\000\000\000\000\000\000"}
tlog = 0
turnon = 1
request = {fd = 7, user = '\0' <repeats 127 times>,
daemon = "backup2-sd", '\0' <repeats 117 times>,
pid = "18535\000\000\000\000", client = {{name = '\0' <repeats 127 times>,
addr = '\0' <repeats 127 times>, sin = 0x7f8ef1eb2a40, unit = 0x0,
request = 0x7fffbd0a8310}}, server = {{name = '\0' <repeats 127 times>,
addr = '\0' <repeats 127 times>, sin = 0x7f8ef1eb29c0, unit = 0x0,
request = 0x7fffbd0a8310}}, sink = 0,
hostname = 0x7f8ef1cafdc0 <sock_hostname>,
hostaddr = 0x7f8ef1cafd70 <sock_hostaddr>, cleanup = 0, config = 0x0}
p = (IPADDR *) 0x0
fd_ptr = (s_sockfd *) 0x0
buf = "127.0.1.1\00032\000\000\000\000\0000\220ó\216\177\000\000ÈE@\000\000\000\000\000à\200@\000\000\000\000\000\200\212\n½ÿ\177\000\000\220\215\vò\216\177\000\000y\216\n½ÿ\177\000\000 \210\n½ÿ\177\000\000RÀoó\216\177\000\000 \206\001", '\0' <repeats 13 times>, "H}\fò\216\177\000\000\000\000\000\000\000\000\000\000p\210\n½ÿ\177\000\000ïP\fò\216\177\000"
sockfds = {<SMARTALLOC> = {<No data fields>}, head = 0x7fffbd0a7880,
tail = 0x7fffbd0a7820, loffset = 0, num_items = 3}
allbuf = "\001\000\000\000ÿ\177\000\000ܲ¦\000\000\000\000\000\200\212\n½ÿ\177", '\0' <repeats 14 times>, "\001\000\000\000`C\217ó\216\177\000\000à\214\217ó\216\177\000\000`z\n½ÿ\177\000\000\210\211\217ó\216\177\000\000Ö\tnñ\216\177\000\000\220\200\n½ÿ\177\000\000Ü\036oó\216\177\000\000\030§¦\000\000\000\000\000ÐW\217ó\216\177\000\000\016\000\000\000\000\000\000\000\026\000\000\000\000\000\000\000$ù\000\002\000\000\000\000\224#oó\216\177\000\000Änlñ\216\177\000\000$\000\000\000\216\177\000\000ä\003\b\000\000\000\000\000P\000\000\000\000\000\000\000(\000\000\000\000\000\000\000@\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000P\000\000\000"...
#2 0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313
313 stored.c: No such file or directory.
in stored.c
ch = -1
no_signals = false
test_config = false
thid = 1087273296
uid = 0x7fffbd0a8e97 "bacula"
gid = 0x7fffbd0a8ea1 "tape"
python_args = {progname = 0xa6abe8 "backup2-sd", scriptdir = 0x0,
modulename = 0x44adbd "SDStartUp",
configfile = 0xa6a2f8 "/etc/bacula/bacula-sd.conf",
workingdir = 0xa6ac28 "/var/lib/bacula",
job_getattr = 0x4396bc <job_getattr(_object*, char*)>,
job_setattr = 0x439495 <job_setattr(_object*, char*, _object*)>}
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#0 0x0000000000000000 in ?? ()
No symbol table info available.



--
Baptiste MALGUY
PGP fingerprint: 49B0 4F6E 4AA8 B149 B2DF  9267 0F65 6C1C C473 6EC2
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] bacula-sd segfaut, Baptiste Malguy <=