Bacula-users

Re: [Bacula-users] Missing nfs share blocks job [fd: 2.2.8]

2008-11-13 06:26:16
Subject: Re: [Bacula-users] Missing nfs share blocks job [fd: 2.2.8]
From: Arno Lehmann <al AT its-lehmann DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Thu, 13 Nov 2008 12:23:59 +0100
Hi,

13.11.2008 12:05, Ronald Buder wrote:
> Hi,
> 
> we have noticed a blocker which may be resolved in later versions of the 
> file daemon, if not I will file it as a bug. If, for whatsoever reason a 
> network share breaks away, which is (implicitly) included in the fileset 
> the job will stall.

This is normal NFS behaviour - if a NFS server doesn't respond, the 
processes accessing it wait in an uninterruptible state. They also do 
not get notification of a problem by a signal.

That said, newer NFS client implementations allow to change that 
behaviour - under linux, the nfs mount options "soft" and "intr" can 
be used to allow client processes to be notified of unavailable NFS 
shares.

> At this very moment I am waiting for four backup 
> jobs. I have tried to cancel them without any success. The jobs have 
> been running for some 8 hours now, cancellation attempt was roundabout 3 
> hours ago. As the rest of the system is still up and running and doing 
> backups and migration I do not want to restart the director.

You will have to either restart the clients that mount the NFS shares, 
or make the NFS server responsive again.

> Running Jobs:
> Console connected at 13-Nov-08 10:16
>  JobId Level   Name                       Status
> ======================================================================
>  41637 Increme  PLATON-W0001_System.2008-11-13_04.00.21 has been canceled
>  41641 Increme  PLATON-W0003_System.2008-11-13_04.00.25 has been canceled
>  41643 Increme  PLATON-W0004_System.2008-11-13_04.00.27 has been canceled
>  41645 Increme  PLATON-W0005_System.2008-11-13_04.00.29 has been canceled
> 
> Due to a server failure the nfs shares are not available anymore. I 
> would like to see some sort of a timeout at least if that is at all 
> possible.

That's not possible inside Bacula - the FD simply can't terminate file 
system accesses that are stalled due to NFS problems.

The best thing to do is often a restart of the NFS server.

Arno

> The reason why I did not file the bug right away is because it may have 
> been resolved with a later client version already, I will try to 
> reproduce the steps with a more current version of the file daemon and 
> post the news here. Any experiences on that matter are of course welcome...
> 
> Client: Sparc Solaris 10 (SunOS 5.9 Generic_122300-15 sun4u sparc 
> SUNW,Sun-Fire-V490), FD-Version: 2.2.8
> Server: Debian Etch, Dir-Version: 2.4.3
> 
> Best regards,
> 
> Ronald
> 

-- 
Arno Lehmann
IT-Service Lehmann
Sandstr. 6, 49080 Osnabrück
www.its-lehmann.de

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users