Bacula-users

Re: [Bacula-users] sd died after Device /dev/... - not ready, retrying

2008-10-07 04:34:02
Subject: Re: [Bacula-users] sd died after Device /dev/... - not ready, retrying
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 07 Oct 2008 10:29:45 +0200
Hi,

07.10.2008 09:54, Ralf Gross wrote:
> Hi,
> 
> last night I was hit by a mtx/drive problem. 
> 
> 20081007-00:02:22 Doing mtx -f /dev/Neo4100 load 96 2
> 20081007-00:02:22 Device /dev/ULTRIUM-TD4-D3 - not ready, retrying...
> 20081007-00:02:23 Device /dev/ULTRIUM-TD4-D3 - not ready, retrying...
> [...]
> 20081007-00:07:35 Parms: /dev/Neo4100 loaded 96 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:07:35 Doing mtx -f /dev/Neo4100 2 -- to find what is
> loaded
> 20081007-00:07:35 Parms: /dev/Neo4100 load 96 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:07:35 Doing mtx -f /dev/Neo4100 load 96 2
> 20081007-00:07:35 Device /dev/ULTRIUM-TD4-D3 - not ready, retrying...
> [...]
> 20081007-00:12:34 Device /dev/ULTRIUM-TD4-D3 - not ready, retrying...
> 20081007-00:12:36 Parms: /dev/Neo4100 loaded 96 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:12:36 Doing mtx -f /dev/Neo4100 2 -- to find what is
> loaded
> 20081007-00:12:36 Parms: /dev/Neo4100 load 96 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:12:36 Doing mtx -f /dev/Neo4100 load 96 2
> 20081007-00:12:37 Device /dev/ULTRIUM-TD4-D3 - not ready, retrying...
> [...]
> 20081007-00:17:35 Device /dev/ULTRIUM-TD4-D3 - not ready, retrying...
> 20081007-00:17:37 Parms: /dev/Neo4100 loaded 111 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:17:37 Doing mtx -f /dev/Neo4100 2 -- to find what is
> loaded
> 20081007-00:17:37 Parms: /dev/Neo4100 load 111 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:17:37 Doing mtx -f /dev/Neo4100 load 111 2
> 20081007-00:18:14 Parms: /dev/Neo4100 loaded 111 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:18:14 Doing mtx -f /dev/Neo4100 2 -- to find what is
> loaded
> 20081007-00:18:18 Parms: /dev/Neo4100 unload 111 /dev/ULTRIUM-TD4-D3 2
> 20081007-00:18:18 Doing mtx -f /dev/Neo4100 unload 111 2

Perhaps a hardware-related problem? Have you had a look into the 
system's log files?

> Then the sd died:

Now that's bad... even in case of a seriuos problem the SD shouldn't die.

> 07-Okt 00:19 VU0EA003-sd: ABORTING due to ERROR in dev.c:724
> dev.c:723 Bad call to rewind. Device "ULTRIUM-TD4-D3"
> (/dev/ULTRIUM-TD4-D3) not open
> Kaboom! bacula-sd, VU0EA003-sd got signal 11 - Segmentation violation.
> Attempting traceback.
> Kaboom! exepath=/usr/sbin/
> Calling: /usr/sbin/btraceback /usr/sbin/bacula-sd 15802
> 
> 
> http://www.bacula.org/en/dev-manual/What_Do_When_Bacula.html
> 
> gdb is installed but bacula-sd is not running as root, maybe that was
> the reason why I got no traceback by mail.

Possible... I believe gdb needs to run as root in some circumstances, 
but that's definitely not my field of expertise :-)

> 
> Anyway, I've seen this 'not ready, retrying...' problem only once
> 5 months ago. There is nothing in the system logs or the changer
> logfile when it happens.
> 
> Any ideas what I've to do to prevent bacula from crash at that point?

No, but a suggestion.

> I've changed the mtx-changer script to wait a bit longer:
> 
> wait_for_drive() {
>   i=0
>   while [ $i -le 50 ]; do  # Wait max 1000 seconds
>     if mt -f $1 status | grep "${ready}" >/dev/null 2>&1; then
>       break
>     fi
>     debug "Device $1 - not ready, retrying..."
>     sleep 1
>     i=`expr $i + 20`

That should be $+ +1 - now you're running the loop with 0, 20, 40, 60 
and the fourth iteration is already more than 50.

So the retries shown in the log excerpt above would be because of 
Bacula's attempts to run the script, not inside the script.

>   done
> }
> 
> I've no idea what the drive was doing during the 15 minutes this night...

I haven't, either. Just my observation above. Sorry.

Arno

> Ralf
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>