Re: [Bacula-users] sd died after Device /dev/... - not ready, retrying
2008-10-07 04:50:13
Arno Lehmann schrieb:
> Perhaps a hardware-related problem? Have you had a look into the
> system's log files?
Didn't find anything related in the logs.
> > Then the sd died:
>
> Now that's bad... even in case of a seriuos problem the SD shouldn't die.
>
> > 07-Okt 00:19 VU0EA003-sd: ABORTING due to ERROR in dev.c:724
> > dev.c:723 Bad call to rewind. Device "ULTRIUM-TD4-D3"
> > (/dev/ULTRIUM-TD4-D3) not open
> > Kaboom! bacula-sd, VU0EA003-sd got signal 11 - Segmentation violation.
> > Attempting traceback.
> > Kaboom! exepath=/usr/sbin/
> > Calling: /usr/sbin/btraceback /usr/sbin/bacula-sd 15802
> >
> >
> > http://www.bacula.org/en/dev-manual/What_Do_When_Bacula.html
> >
> > gdb is installed but bacula-sd is not running as root, maybe that was
> > the reason why I got no traceback by mail.
>
> Possible... I believe gdb needs to run as root in some circumstances,
> but that's definitely not my field of expertise :-)
I've changed the btraceback file to be suid root. Not the best way, but this is
not a multi user machine.
> >
> > Anyway, I've seen this 'not ready, retrying...' problem only once
> > 5 months ago. There is nothing in the system logs or the changer
> > logfile when it happens.
> >
> > Any ideas what I've to do to prevent bacula from crash at that point?
>
> No, but a suggestion.
>
> > I've changed the mtx-changer script to wait a bit longer:
> >
> > wait_for_drive() {
> > i=0
> > while [ $i -le 50 ]; do # Wait max 1000 seconds
> > if mt -f $1 status | grep "${ready}" >/dev/null 2>&1; then
> > break
> > fi
> > debug "Device $1 - not ready, retrying..."
> > sleep 1
> > i=`expr $i + 20`
>
> That should be $+ +1 - now you're running the loop with 0, 20, 40, 60
> and the fourth iteration is already more than 50.
>
> So the retries shown in the log excerpt above would be because of
> Bacula's attempts to run the script, not inside the script.
>
> > done
> > }
err, I think this is what I wanted:
wait_for_drive() {
i=0
while [ $i -le 1000 ]; do # Wait max 1000 seconds
if mt -f $1 status | grep "${ready}" >/dev/null 2>&1; then
break
fi
debug "Device $1 - not ready, retrying..."
sleep 30
i=`expr $i + 30`
done
}
Increase the wait time to 1000s in 30s steps.
> > I've no idea what the drive was doing during the 15 minutes this night...
>
> I haven't, either. Just my observation above. Sorry.
Well, let's see if it happens again and if the longer wait_time prevents the sd
from dying.
Ralf
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
|
|