Bacula-users

Re: [Bacula-users] sd died after Device /dev/... - not ready, retrying

2008-10-07 04:50:13
Subject: Re: [Bacula-users] sd died after Device /dev/... - not ready, retrying
From: Ralf Gross <Ralf-Lists AT ralfgross DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 7 Oct 2008 10:45:05 +0200
Arno Lehmann schrieb:
> Perhaps a hardware-related problem? Have you had a look into the 
> system's log files?


Didn't find anything related in the logs.

 
> > Then the sd died:
> 
> Now that's bad... even in case of a seriuos problem the SD shouldn't die.
> 
> > 07-Okt 00:19 VU0EA003-sd: ABORTING due to ERROR in dev.c:724
> > dev.c:723 Bad call to rewind. Device "ULTRIUM-TD4-D3"
> > (/dev/ULTRIUM-TD4-D3) not open
> > Kaboom! bacula-sd, VU0EA003-sd got signal 11 - Segmentation violation.
> > Attempting traceback.
> > Kaboom! exepath=/usr/sbin/
> > Calling: /usr/sbin/btraceback /usr/sbin/bacula-sd 15802
> > 
> > 
> > http://www.bacula.org/en/dev-manual/What_Do_When_Bacula.html
> > 
> > gdb is installed but bacula-sd is not running as root, maybe that was
> > the reason why I got no traceback by mail.
> 
> Possible... I believe gdb needs to run as root in some circumstances, 
> but that's definitely not my field of expertise :-)


I've changed the btraceback file to be suid root. Not the best way, but this is
not a multi user machine.

 
> > 
> > Anyway, I've seen this 'not ready, retrying...' problem only once
> > 5 months ago. There is nothing in the system logs or the changer
> > logfile when it happens.
> > 
> > Any ideas what I've to do to prevent bacula from crash at that point?
> 
> No, but a suggestion.
> 
> > I've changed the mtx-changer script to wait a bit longer:
> > 
> > wait_for_drive() {
> >   i=0
> >   while [ $i -le 50 ]; do  # Wait max 1000 seconds
> >     if mt -f $1 status | grep "${ready}" >/dev/null 2>&1; then
> >       break
> >     fi
> >     debug "Device $1 - not ready, retrying..."
> >     sleep 1
> >     i=`expr $i + 20`
> 
> That should be $+ +1 - now you're running the loop with 0, 20, 40, 60 
> and the fourth iteration is already more than 50.
> 
> So the retries shown in the log excerpt above would be because of 
> Bacula's attempts to run the script, not inside the script.
> 
> >   done
> > }


err, I think this is what I wanted:

wait_for_drive() {
  i=0
  while [ $i -le 1000 ]; do  # Wait max 1000 seconds
    if mt -f $1 status | grep "${ready}" >/dev/null 2>&1; then
      break
    fi
    debug "Device $1 - not ready, retrying..."
    sleep 30
    i=`expr $i + 30`
  done
}

Increase the wait time to 1000s in 30s steps.



> > I've no idea what the drive was doing during the 15 minutes this night...
> 
> I haven't, either. Just my observation above. Sorry.


Well, let's see if it happens again and if the longer wait_time prevents the sd
from dying.

Ralf

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>