Bacula-users

[Bacula-users] backups fail with "ERR=Stale NFS file handle", but NFS is not in use!

2011-11-02 11:10:15
Subject: [Bacula-users] backups fail with "ERR=Stale NFS file handle", but NFS is not in use!
From: mark.bergman AT uphs.upenn DOT edu
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 02 Nov 2011 11:07:14 -0400
Last night I had two long-running backup jobs (~1TB and ~1.2TB) fail
with the same error:

        Fatal error: Error writing data to spool file. ERR=Stale NFS file handle

The problem is that the spool file is NOT on an NFS mounted disk! The
spool file is on a GPFS partition. Spool files are limited to 200GB each,
and there were two jobs running, so the 1TB partition was not full.

It's difficult to deal with failures at the very end of jobs that take ~24hrs.

Any suggestions?


Here is an excerpt of the bacula log file for one of the jobs:
-------------------
31-Oct 17:43 sbia-infr-vbacula JobId 41350: Start Backup JobId 41350, 
Job=sbiaprj-1.2011-10-31_17.43.22_35
31-Oct 17:45 sbia-infr-vbacula JobId 41350: Volume "003209" previously written, 
moving to end of data.
31-Oct 17:47 sbia-infr-vbacula JobId 41350: Spooling data ...
31-Oct 20:56 sbia-infr-vbacula JobId 41350: User specified spool size reached.
31-Oct 20:56 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume.  
Despooling 214,748,386,134 bytes ...
31-Oct 21:46 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:49:21, 
Transfer rate = 72.52 M Bytes/second
31-Oct 21:46 sbia-infr-vbacula JobId 41350: Spooling data again ...
01-Nov 03:14 sbia-infr-vbacula JobId 41350: User specified spool size reached.
01-Nov 03:14 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume.  
Despooling 214,748,386,400 bytes ...
01-Nov 03:36 sbia-infr-vbacula JobId 41350: End of Volume "003209" at 135:32909 
on device "ml6000-drv0" (/dev/tape0-ml6000). Write of 64512 bytes got -1.
01-Nov 03:40 sbia-infr-vbacula JobId 41350: New volume "004069" mounted on 
device "ml6000-drv0" (/dev/tape0-ml6000) at 01-Nov-2011 03:40.
01-Nov 04:08 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:49:52, 
Transfer rate = 71.77 M Bytes/second
01-Nov 04:08 sbia-infr-vbacula JobId 41350: Spooling data again ...
01-Nov 07:04 sbia-infr-vbacula JobId 41350: User specified spool size reached.
01-Nov 07:04 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume.  
Despooling 214,748,386,354 bytes ...
01-Nov 08:02 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:56:51, 
Transfer rate = 62.95 M Bytes/second
01-Nov 08:02 sbia-infr-vbacula JobId 41350: Spooling data again ...
01-Nov 12:01 sbia-infr-vbacula JobId 41350: User specified spool size reached.
01-Nov 12:01 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume.  
Despooling 214,748,383,710 bytes ...
01-Nov 12:55 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:54:15, 
Transfer rate = 65.97 M Bytes/second
01-Nov 12:55 sbia-infr-vbacula JobId 41350: Spooling data again ...
01-Nov 18:06 sbia-infr-vbacula JobId 41350: User specified spool size reached.
01-Nov 18:06 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume.  
Despooling 214,748,385,576 bytes ...
01-Nov 18:49 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:43:17, 
Transfer rate = 82.69 M Bytes/second
01-Nov 18:50 sbia-infr-vbacula JobId 41350: Spooling data again ...
01-Nov 21:33 sbia-infr-vbacula JobId 41350: Fatal error: Error writing data to 
spool file. ERR=Stale NFS file handle
01-Nov 21:33 sbia-infr-vbacula JobId 41350: Writing spooled data to Volume.  
Despooling 95,632,760,393 bytes ...
01-Nov 21:34 sbia-infr-vbacula JobId 41350: Despooling elapsed time = 00:00:01, 
Transfer rate = 95.63 G Bytes/second
01-Nov 21:34 sbia-infr-vbacula JobId 41350: Fatal error: Fatal despooling 
error.01-Nov 21:34 sbia-infr-vbacula JobId 41350: Job write elapsed time = 
27:43:21, Transfer rate = 11.70 M Bytes/second
01-Nov 21:34 sbia-infr3 JobId 41350: Fatal error: backup.c:1019 Network send 
error to SD. ERR=Connection reset by peer
01-Nov 21:34 sbia-infr-vbacula JobId 41350: Error: Bacula sbia-infr-vbacula 
5.0.2 (28Apr10): 01-Nov-2011 21:34:27
  Build OS:               x86_64-redhat-linux-gnu redhat 
  JobId:                  41350
  Job:                    sbiaprj-1.2011-10-31_17.43.22_35
  Backup Level:           Full
  Client:                 "sbia-infr-vnfs1" 5.0.2 (28Apr10) 
x86_64-redhat-linux-gnu,redhat,
  FileSet:                "sbiaprj-1" 2011-10-12 23:15:00
  Pool:                   "Full" (From Job FullPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "ml6000" (From Job resource)
  Scheduled time:         31-Oct-2011 17:43:13
  Start time:             31-Oct-2011 17:45:06
  End time:               01-Nov-2011 21:34:27
  Elapsed time:           1 day 3 hours 49 mins 21 secs
  Priority:               10
  FD Files Written:       3,257,746
  SD Files Written:       3,257,744
  FD Bytes Written:       1,167,407,287,207 (1.167 TB)
  SD Bytes Written:       1,168,242,527,818 (1.168 TB)
  Rate:                   11655.3 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         003209|003037|004069
  Volume Session Id:      142
  Volume Session Time:    1319573487
  Last Volume Bytes:      755,423,843,328 (755.4 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***
--------------------------------------


Thanks,

Mark


------------------------------------------------------------------------------
RSA® Conference 2012
Save $700 by Nov 18
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>