Bacula-users

[Bacula-users] bacula-sd dies - Bad call to rewind. Device not open

2009-10-07 07:40:48
Subject: [Bacula-users] bacula-sd dies - Bad call to rewind. Device not open
From: Jan Schulze <schulze AT informatik.uni-tuebingen DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 7 Oct 2009 13:36:10 +0200 (CEST)
Hi all,

I have been using Bacula 2.2.8 with an Overland ArcVault 24 for a long time 
without problems. Recently, the SD is dying every night before the first 
backup. I get the following GDB traceback and the backups fail due to "Comm 
error with SD" (of course).

Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
[New Thread 46912498413632 (LWP 10906)]
[New Thread 1105209664 (LWP 12948)]
[New Thread 1094719808 (LWP 10909)]
0x00000036482c7922 in select () from /lib64/libc.so.6
$1 = "cambridge-sd", '\0' <repeats 17 times>
$2 = 0xf10088 "bacula-sd"
$3 = 0xf100c8 "/raid/export/soft/bacula-2.2.8/sbin/bacula-sd"
$4 = 0x0
$5 = 0x4fc07a "2.2.8 (26 January 2008)"
$6 = 0x4f78a6 "x86_64-unknown-linux-gnu"
$7 = 0x4f78c3 "redhat"
$8 = 0x4f78bf "5.1"
#0  0x00000036482c7922 in select () from /lib64/libc.so.6
#1  0x0000000000436a74 in bnet_thread_server (addrs=0xf11df8, max_clients=41,
    client_wq=0x766c20,
    handle_client_request=0x41b440 <handle_connection_request(void*)>)
    at bnet_server.c:161
#2  0x0000000000409519 in main (argc=<value optimized out>,
    argv=<value optimized out>) at stored.c:265

Thread 3 (Thread 1094719808 (LWP 10909)):
#0  0x0000003648e0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x000000000044d6bd in watchdog_thread (arg=<value optimized out>)
    at watchdog.c:307
#2  0x0000003648e062e7 in start_thread () from /lib64/libpthread.so.0
#3  0x00000036482ce3bd in clone () from /lib64/libc.so.6

Thread 2 (Thread 1105209664 (LWP 12948)):
#0  0x0000003648e0d9ef in waitpid () from /lib64/libpthread.so.0
#1  0x000000000044afb1 in signal_handler (sig=11) at signal.c:167
#2  <signal handler called>
#3  e_msg (file=0x4ef165 "dev.c", line=724, type=1,
    level=<value optimized out>,
    fmt=0xf11fb0 "dev.c:723 Bad call to rewind. Device \"Drive-1\" (/dev/nst0) 
not open\n") at message.c:1062
#4  0x00000000004177dc in DEVICE::rewind (this=0xf13f78, dcr=0xf28dd8)
    at dev.c:724
#5  0x0000000000422aff in read_dev_volume_label (dcr=0xf28dd8) at label.c:261
#6  0x00000000004262c0 in mount_next_write_volume (dcr=0xf28dd8,
    have_vol=<value optimized out>, release=false) at mount.c:234
#7  0x000000000040d99e in acquire_device_for_append (dcr=0xf28dd8)
    at acquire.c:422
#8  0x000000000040df62 in do_append_data (jcr=0xf26df8) at append.c:85
#9  0x000000000041fe82 in append_data_cmd (jcr=0xf26df8) at fd_cmds.c:194
#10 0x000000000041fade in do_fd_commands (jcr=0xf26df8) at fd_cmds.c:165
#11 0x000000000041ffb5 in run_job (jcr=0xf26df8) at fd_cmds.c:128
#12 0x00000000004203d7 in run_cmd (jcr=0xf26df8) at job.c:210
#13 0x000000000041b7ad in handle_connection_request (arg=<value optimized out>)
    at dircmd.c:229
#14 0x000000000044dc7f in workq_server (arg=<value optimized out>)
    at workq.c:357
#15 0x0000003648e062e7 in start_thread () from /lib64/libpthread.so.0
#16 0x00000036482ce3bd in clone () from /lib64/libc.so.6

Thread 1 (Thread 46912498413632 (LWP 10906)):
#0  0x00000036482c7922 in select () from /lib64/libc.so.6
#1  0x0000000000436a74 in bnet_thread_server (addrs=0xf11df8, max_clients=41,
    client_wq=0x766c20,
    handle_client_request=0x41b440 <handle_connection_request(void*)>)
    at bnet_server.c:161
#2  0x0000000000409519 in main (argc=<value optimized out>,
    argv=<value optimized out>) at stored.c:265
#0  0x00000036482c7922 in select () from /lib64/libc.so.6
#0  0x00000036482c7922 in select () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000000000436a74 in bnet_thread_server (addrs=0xf11df8, max_clients=41,
    client_wq=0x766c20,
    handle_client_request=0x41b440 <handle_connection_request(void*)>)
    at bnet_server.c:161
161              if ((stat = select(maxfd + 1, &sockset, NULL, NULL, NULL)) < 
0) {
Current language:  auto; currently c++
maxfd = <value optimized out>
sockset = {fds_bits = {16, 0 <repeats 15 times>}}
newsockfd = <value optimized out>
stat = <value optimized out>
clilen = 16
cli_addr = {sa_family = 2,
  sa_data = "�M\206\002\t\215\000\000\000\000\000\000\000"}
tlog = <value optimized out>
turnon = 1
p = (IPADDR *) 0xf123e8
fd_ptr = /etc/bacula/btraceback.gdb:14: Error in sourced command file:
dwarf2_read_address: Corrupted DWARF expression.


I was under the impression, that the tape drive might be broken, because of the 
"Device not open" part. However, I have run the btest program without any 
errors. 

Any suggestions, how to debug this?


Best Regards
Jan

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>