Bacula-users

Re: [Bacula-users] Missing nfs share blocks job [fd: 2.2.8]

2008-11-13 10:22:53
Subject: Re: [Bacula-users] Missing nfs share blocks job [fd: 2.2.8]
From: Ronald Buder <rbuder AT proficom-ag DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 13 Nov 2008 16:20:36 +0100
Kjetil Torgrim Homme wrote:
> Arno Lehmann <al AT its-lehmann DOT de> writes:
>   
>> 13.11.2008 12:05, Ronald Buder wrote:
>>     
>>> Due to a server failure the nfs shares are not available anymore. I 
>>> would like to see some sort of a timeout at least if that is at all 
>>> possible.
>>>       
>> That's not possible inside Bacula - the FD simply can't terminate
>> file system accesses that are stalled due to NFS problems.
>>     
>
> if the filesystem is mounted with "intr" (and you should always mount
> NFS with "intr"), the FD *can* set an alarm on itself and recover.
> not sure what the recovery behaviour should be -- simply aborting the
> whole job would be in improvement (the timeout value should be user
> configurable of course), but it could also stop reading more files in
> the current filesystem and go on with the job.  Bacula knows when it
> steps into a new filesystem, so it can be taught how to jump out of
> that recursion.
>
> slightly related, Bacula could try to send an NFS ping to the NFS
> server before recursing into the filesystem.  this means Bacula will
> not waste time if the server was down already, but I'm not sure it's
> worthwhile to complicate the code with NFS specific code.  if you're
> interested, you can look at the code in a utility on my web page:
>
>   http://heim.ifi.uio.no/kjetilho/hacks/#cknfs
>
>   
>> The best thing to do is often a restart of the NFS server.
>>     
>
> yes, life is better when the NFS server is up :-)
>   
In that specific case the server could not be restarted due to some 
misconfiguration I did not want to deal with :).

Anyways, I like the idea mentioned above, the nfs ping thingy. However, 
talking to my colleague here we wonder if this is at all possible. Does 
the client really know beforehand, that the next directory holds a 
mounted filesystem? The job logs kinda make me think it doesn't. The fd 
starts backing up right away and, as it goes along, notices that it just 
dropped into another filesystem. This could probably be resolved by 
checking /etc/mtab (for Linux that is), but still will not be the 
ultimate solution to this issue as that ZFS datasets on Solaris do not 
show up in the mtab at all iirc.

Still a client side timeout could be a easy way out of a lock like that.

Ronald


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users