Hello,
I have a blocking issue with bacula-sd daemon. Environment : - Debian Lenny AMD64 - Kernel: 2.6.32-bpo.4-amd64 - Bacula version : 3.0.3 and 5.0.3 - We use TLS for authentication and transfers.
Every few days, bacula-sd quits with a segfault. I've setup the debugging stuff, so I finally have the backtrace. When I read it, I see nothing "showing" what caused the segfault.
Anyone who "read" better ?
It happens both when there are a couple of low I/O jobs or several high I/O jobs. By I/O, I mean disk and networ. Indeed, we backup on disk only. I could tell a lot about our setup which would be a lot of noise, so let me know what is actually interesting for the matter.
Regards,
[Thread debugging using libthread_db enabled] [New Thread 0x7f8ef38f36f0 (LWP 18535)] [New Thread 0x42900950 (LWP 12104)] [New Thread 0x44904950 (LWP 26990)] [New Thread 0x418fe950 (LWP 18539)]
0x00007f8ef0a89d52 in select () from /lib/libc.so.6 $1 = '\0' <repeats 29 times> $2 = 0xa6a088 "bacula-sd" $3 = 0xa6a0c8 "/usr/sbin/bacula-sd" $4 = 0x0 $5 = 0x7f8ef2cc0c28 "5.0.3 (04 August 2010)"
$6 = 0x7f8ef2cc0c4c "x86_64-pc-linux-gnu" $7 = 0x7f8ef2cc0c60 "debian" $8 = 0x7f8ef2cc0c67 "5.0.5" $9 = "backup2", '\0' <repeats 42 times> $10 = 0x7f8ef2cc0c3f "debian 5.0.5"
$11 = 0 Environment variable "TestName" not defined. #0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6 #1 0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33, client_wq=0x6642c0,
handle_client_request=0x4278c6 <handle_connection_request(void*)>) at bnet_server.c:161 #2 0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313
Thread 4 (Thread 0x418fe950 (LWP 18539)):
#0 0x00007f8ef20c1fad in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007f8ef2cb6a3d in watchdog_thread (arg=0x0) at watchdog.c:321
#2 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0 #3 0x00007f8ef0a9064d in clone () from /lib/libc.so.6 #4 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x44904950 (LWP 26990)): #0 0x00007f8ef2a5b44e in ?? () from /usr/lib/libz.so.1
#1 0x00007f8ef2a5a28d in deflate () from /usr/lib/libz.so.1 #2 0x00007f8ef17e294e in ?? () from /usr/lib/libcrypto.so.0.9.8 #3 0x00007f8ef17e25b2 in COMP_compress_block () from /usr/lib/libcrypto.so.0.9.8
#4 0x00007f8ef1a7e35e in ssl3_do_compress () from /usr/lib/libssl.so.0.9.8 #5 0x00007f8ef1a7e4ac in ?? () from /usr/lib/libssl.so.0.9.8 #6 0x00007f8ef1a7e9a0 in ssl3_write_bytes () from /usr/lib/libssl.so.0.9.8
#7 0x00007f8ef2cad1b0 in openssl_bsock_readwrite (bsock=0x107b568, ptr=0xc8efdc "", nbytes=4, write=true) at tls.c:626 #8 0x00007f8ef2cad483 in tls_bsock_writen (bsock=0x107b568, ptr=0xc8efdc "",
nbytes=4) at tls.c:704 #9 0x00007f8ef2c84670 in write_nbytes (bsock=0x107b568, ptr=0xc8efdc "", nbytes=4) at bnet.c:128 #10 0x00007f8ef2c880a4 in BSOCK::send (this=0x107b568) at bsock.c:379
#11 0x00007f8ef2c885e8 in BSOCK::signal (this=0x107b568, signal=-4) at bsock.c:574 #12 0x0000000000428070 in handle_connection_request (arg=0x107b568) at dircmd.c:251 #13 0x00007f8ef2cb7587 in workq_server (arg=0x6642c0) at workq.c:346
#14 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0 #15 0x00007f8ef0a9064d in clone () from /lib/libc.so.6 #16 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x42900950 (LWP 12104)): #0 0x00007f8ef20c55ef in waitpid () from /lib/libpthread.so.0
#1 0x00007f8ef2cab0b7 in signal_handler (sig=11) at signal.c:229 #2 <signal handler called> #3 0x00007f8ef2a5b0bf in ?? () from /usr/lib/libz.so.1 #4 0x00007f8ef2a5a28d in deflate () from /usr/lib/libz.so.1
#5 0x00007f8ef17e294e in ?? () from /usr/lib/libcrypto.so.0.9.8 #6 0x00007f8ef17e25b2 in COMP_compress_block () from /usr/lib/libcrypto.so.0.9.8 #7 0x00007f8ef1a7e35e in ssl3_do_compress () from /usr/lib/libssl.so.0.9.8
#8 0x00007f8ef1a7e4ac in ?? () from /usr/lib/libssl.so.0.9.8 #9 0x00007f8ef1a7e9a0 in ssl3_write_bytes () from /usr/lib/libssl.so.0.9.8 #10 0x00007f8ef2cad1b0 in openssl_bsock_readwrite (bsock=0x107b568, ptr=0xc8efdc "", nbytes=182, write=true) at tls.c:626
#11 0x00007f8ef2cad483 in tls_bsock_writen (bsock=0x107b568, ptr=0xc8efdc "", nbytes=182) at tls.c:704 #12 0x00007f8ef2c84670 in write_nbytes (bsock=0x107b568, ptr=0xc8efdc "", nbytes=182) at bnet.c:128
#13 0x00007f8ef2c880a4 in BSOCK::send (this=0x107b568) at bsock.c:379 #14 0x00007f8ef2c887c7 in BSOCK::fsend (this=0x107b568, fmt=0x7f8ef2cc05d0 "Jmsg Job=%s type=%d level=%lld %s") at bsock.c:434
#15 0x00007f8ef2c9c80f in dispatch_message (jcr=0xe6ba38, type=6, mtime=1283388921, msg=0x428fe870 "backup2-sd JobId 83624: JobId=83624 Job=\"ivr-db1-1-System.2010-09-02_00.00.01_26\" marked to be canceled.\n") at message.c:888
#16 0x00007f8ef2c9cf8e in Jmsg (jcr=0xe6ba38, type=6, mtime=0, fmt=0x451860 "JobId=%d Job=\"%s\" marked to be canceled.\n") at message.c:1292 #17 0x0000000000427763 in cancel_cmd (cjcr=0xc12248) at dircmd.c:335
#18 0x0000000000427f24 in handle_connection_request (arg=0x134a028) at dircmd.c:233 #19 0x00007f8ef2cb7587 in workq_server (arg=0x6642c0) at workq.c:346 #20 0x00007f8ef20bdfc7 in start_thread () from /lib/libpthread.so.0
#21 0x00007f8ef0a9064d in clone () from /lib/libc.so.6 #22 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f8ef38f36f0 (LWP 18535)): #0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6 #1 0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33,
client_wq=0x6642c0, handle_client_request=0x4278c6 <handle_connection_request(void*)>) at bnet_server.c:161 #2 0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313 #0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6
#0 0x00007f8ef0a89d52 in select () from /lib/libc.so.6 No symbol table info available. #1 0x00007f8ef2c85264 in bnet_thread_server (addrs=0xa6ab98, max_clients=33, client_wq=0x6642c0, handle_client_request=0x4278c6 <handle_connection_request(void*)>)
at bnet_server.c:161 161 bnet_server.c: No such file or directory. in bnet_server.c Current language: auto; currently c++ maxfd = 6 sockset = {fds_bits = {112, 0 <repeats 15 times>}} newsockfd = 7
stat = 0 clilen = 16 cli_addr = {sa_family = 2, sa_data = "®(\177\000\001\001\000\000\000\000\000\000\000"} tlog = 0 turnon = 1 request = {fd = 7, user = '\0' <repeats 127 times>,
daemon = "backup2-sd", '\0' <repeats 117 times>, pid = "18535\000\000\000\000", client = {{name = '\0' <repeats 127 times>, addr = '\0' <repeats 127 times>, sin = 0x7f8ef1eb2a40, unit = 0x0,
request = 0x7fffbd0a8310}}, server = {{name = '\0' <repeats 127 times>, addr = '\0' <repeats 127 times>, sin = 0x7f8ef1eb29c0, unit = 0x0, request = 0x7fffbd0a8310}}, sink = 0,
hostname = 0x7f8ef1cafdc0 <sock_hostname>, hostaddr = 0x7f8ef1cafd70 <sock_hostaddr>, cleanup = 0, config = 0x0} p = (IPADDR *) 0x0 fd_ptr = (s_sockfd *) 0x0 buf = "127.0.1.1\00032\000\000\000\000\0000\220ó\216\177\000\000ÈE@\000\000\000\000\000à\200@\000\000\000\000\000\200\212\n½ÿ\177\000\000\220\215\vò\216\177\000\000y\216\n½ÿ\177\000\000 \210\n½ÿ\177\000\000RÀoó\216\177\000\000 \206\001", '\0' <repeats 13 times>, "H}\fò\216\177\000\000\000\000\000\000\000\000\000\000p\210\n½ÿ\177\000\000ïP\fò\216\177\000"
sockfds = {<SMARTALLOC> = {<No data fields>}, head = 0x7fffbd0a7880, tail = 0x7fffbd0a7820, loffset = 0, num_items = 3} allbuf = "\001\000\000\000ÿ\177\000\000ܲ¦\000\000\000\000\000\200\212\n½ÿ\177", '\0' <repeats 14 times>, "\001\000\000\000`C\217ó\216\177\000\000à\214\217ó\216\177\000\000`z\n½ÿ\177\000\000\210\211\217ó\216\177\000\000Ö\tnñ\216\177\000\000\220\200\n½ÿ\177\000\000Ü\036oó\216\177\000\000\030§¦\000\000\000\000\000ÐW\217ó\216\177\000\000\016\000\000\000\000\000\000\000\026\000\000\000\000\000\000\000$ù\000\002\000\000\000\000\224#oó\216\177\000\000Änlñ\216\177\000\000$\000\000\000\216\177\000\000ä\003\b\000\000\000\000\000P\000\000\000\000\000\000\000(\000\000\000\000\000\000\000@\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000P\000\000\000"...
#2 0x0000000000409945 in main (argc=0, argv=0x7fffbd0a8ac0) at stored.c:313 313 stored.c: No such file or directory. in stored.c ch = -1 no_signals = false test_config = false thid = 1087273296 uid = 0x7fffbd0a8e97 "bacula"
gid = 0x7fffbd0a8ea1 "tape" python_args = {progname = 0xa6abe8 "backup2-sd", scriptdir = 0x0, modulename = 0x44adbd "SDStartUp", configfile = 0xa6a2f8 "/etc/bacula/bacula-sd.conf",
workingdir = 0xa6ac28 "/var/lib/bacula", job_getattr = 0x4396bc <job_getattr(_object*, char*)>, job_setattr = 0x439495 <job_setattr(_object*, char*, _object*)>} #0 0x0000000000000000 in ?? ()
No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available. #0 0x0000000000000000 in ?? () No symbol table info available.
#0 0x0000000000000000 in ?? () No symbol table info available.
-- Baptiste MALGUY PGP fingerprint: 49B0 4F6E 4AA8 B149 B2DF 9267 0F65 6C1C C473 6EC2
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:
Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|