Re: [Bacula-users] bacula-sd dies - Bad call to rewind. Device not open

Hi all,

----- "Jan Schulze" <schulze AT informatik.uni-tuebingen DOT de> wrote:

> Hi all,
> 
> I have been using Bacula 2.2.8 with an Overland ArcVault 24 for a long
> time without problems. Recently, the SD is dying every night before
> the first backup. I get the following GDB traceback and the backups
> fail due to "Comm error with SD" (of course).
> 
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [Thread debugging using libthread_db enabled]
> [New Thread 46912498413632 (LWP 10906)]
> [New Thread 1105209664 (LWP 12948)]
> [New Thread 1094719808 (LWP 10909)]
> 0x00000036482c7922 in select () from /lib64/libc.so.6
> $1 = "cambridge-sd", '\0' <repeats 17 times>
> $2 = 0xf10088 "bacula-sd"
> $3 = 0xf100c8 "/raid/export/soft/bacula-2.2.8/sbin/bacula-sd"
> $4 = 0x0
> $5 = 0x4fc07a "2.2.8 (26 January 2008)"
> $6 = 0x4f78a6 "x86_64-unknown-linux-gnu"
> $7 = 0x4f78c3 "redhat"
> $8 = 0x4f78bf "5.1"
> #0  0x00000036482c7922 in select () from /lib64/libc.so.6
> #1  0x0000000000436a74 in bnet_thread_server (addrs=0xf11df8,
> max_clients=41,
>     client_wq=0x766c20,
>     handle_client_request=0x41b440 <handle_connection_request(void*)>)
>     at bnet_server.c:161
> #2  0x0000000000409519 in main (argc=<value optimized out>,
>     argv=<value optimized out>) at stored.c:265
> 
> Thread 3 (Thread 1094719808 (LWP 10909)):
> #0  0x0000003648e0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
>    from /lib64/libpthread.so.0
> #1  0x000000000044d6bd in watchdog_thread (arg=<value optimized out>)
>     at watchdog.c:307
> #2  0x0000003648e062e7 in start_thread () from /lib64/libpthread.so.0
> #3  0x00000036482ce3bd in clone () from /lib64/libc.so.6
> 
> Thread 2 (Thread 1105209664 (LWP 12948)):
> #0  0x0000003648e0d9ef in waitpid () from /lib64/libpthread.so.0
> #1  0x000000000044afb1 in signal_handler (sig=11) at signal.c:167
> #2  <signal handler called>
> #3  e_msg (file=0x4ef165 "dev.c", line=724, type=1,
>     level=<value optimized out>,
>     fmt=0xf11fb0 "dev.c:723 Bad call to rewind. Device \"Drive-1\"
> (/dev/nst0) not open\n") at message.c:1062
> #4  0x00000000004177dc in DEVICE::rewind (this=0xf13f78, dcr=0xf28dd8)
>     at dev.c:724
> #5  0x0000000000422aff in read_dev_volume_label (dcr=0xf28dd8) at
> label.c:261
> #6  0x00000000004262c0 in mount_next_write_volume (dcr=0xf28dd8,
>     have_vol=<value optimized out>, release=false) at mount.c:234
> #7  0x000000000040d99e in acquire_device_for_append (dcr=0xf28dd8)
>     at acquire.c:422
> #8  0x000000000040df62 in do_append_data (jcr=0xf26df8) at append.c:85
> #9  0x000000000041fe82 in append_data_cmd (jcr=0xf26df8) at
> fd_cmds.c:194
> #10 0x000000000041fade in do_fd_commands (jcr=0xf26df8) at
> fd_cmds.c:165
> #11 0x000000000041ffb5 in run_job (jcr=0xf26df8) at fd_cmds.c:128
> #12 0x00000000004203d7 in run_cmd (jcr=0xf26df8) at job.c:210
> #13 0x000000000041b7ad in handle_connection_request (arg=<value
> optimized out>)
>     at dircmd.c:229
> #14 0x000000000044dc7f in workq_server (arg=<value optimized out>)
>     at workq.c:357
> #15 0x0000003648e062e7 in start_thread () from /lib64/libpthread.so.0
> #16 0x00000036482ce3bd in clone () from /lib64/libc.so.6
> 
> Thread 1 (Thread 46912498413632 (LWP 10906)):
> #0  0x00000036482c7922 in select () from /lib64/libc.so.6
> #1  0x0000000000436a74 in bnet_thread_server (addrs=0xf11df8,
> max_clients=41,
>     client_wq=0x766c20,
>     handle_client_request=0x41b440 <handle_connection_request(void*)>)
>     at bnet_server.c:161
> #2  0x0000000000409519 in main (argc=<value optimized out>,
>     argv=<value optimized out>) at stored.c:265
> #0  0x00000036482c7922 in select () from /lib64/libc.so.6
> #0  0x00000036482c7922 in select () from /lib64/libc.so.6
> No symbol table info available.
> #1  0x0000000000436a74 in bnet_thread_server (addrs=0xf11df8,
> max_clients=41,
>     client_wq=0x766c20,
>     handle_client_request=0x41b440 <handle_connection_request(void*)>)
>     at bnet_server.c:161
> 161              if ((stat = select(maxfd + 1, &sockset, NULL, NULL,
> NULL)) < 0) {
> Current language:  auto; currently c++
> maxfd = <value optimized out>
> sockset = {fds_bits = {16, 0 <repeats 15 times>}}
> newsockfd = <value optimized out>
> stat = <value optimized out>
> clilen = 16
> cli_addr = {sa_family = 2,
>   sa_data = "�M\206\002\t\215\000\000\000\000\000\000\000"}
> tlog = <value optimized out>
> turnon = 1
> p = (IPADDR *) 0xf123e8
> fd_ptr = /etc/bacula/btraceback.gdb:14: Error in sourced command file:
> dwarf2_read_address: Corrupted DWARF expression.
> 
> 
> I was under the impression, that the tape drive might be broken,
> because of the "Device not open" part. However, I have run the btest
> program without any errors.
> 
> Any suggestions, how to debug this?


It seems, as if the LTO-2 tape was the culprit. I deleted it from the catalog 
and successfully performed a backup onto another tape. I then completely erased 
the 'problem tape' and was able to use it for a second successful backup. It 
seems, the problems are gone.


Regards,
Jan

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users