Bacula-users

[Bacula-users] number of files mismatch question

2009-08-05 12:37:39
Subject: [Bacula-users] number of files mismatch question
From: uhog-v9e4 AT spamex DOT com
To: Bacula-users AT lists.sourceforge DOT net
Date: Wed, 05 Aug 2009 12:05:32 -0400
Hi all,

I am a new bacula user running 3.02 on Ubuntu 8.04 LTS. I am backing up about 7 
clients(mix of windows and linux) to a DDS3 tape autochanger. After testing and 
configuring for a few days I went live with this on Sunday with a full backup 
of all clients and doing incrementals Mon-Fri. 
So the problem... Yesterday I had a power outage. Bacula was idle, but there 
was a volume mounted in the drive. Today when the incrementals fired, I got an 
error:
05-Aug 10:25 pendual-dir JobId 39: Start Backup JobId 39, 
Job=mrc-vm1-backup.2009-08-05_10.25.39_04
05-Aug 10:25 pendual-dir JobId 39: Using Device "Drive-1"
05-Aug 10:25 mrc-vm1-fd JobId 39: DIR and FD clocks differ by 12 seconds, FD 
automatically compensating.
05-Aug 10:25 pendual-sd JobId 39: Volume "90m_1_1" previously written, moving 
to end of data.
05-Aug 10:26 pendual-sd JobId 39: Error: Bacula cannot write on tape Volume 
"90m_1_1" because:
The number of files mismatch! Volume=1 Catalog=14
05-Aug 10:26 pendual-sd JobId 39: Marking Volume "90m_1_1" in Error in Catalog.

It correctly loaded the next volume in the autochanger and did the backups but 
I'm not sure why I am getting this error? A bls on the volume shows 14 files, 
but does have an interesting entry at the end.
# bls -j -V 90m_1_1 -c /etc/opt/bacula/conf/bacula-sd.conf Drive-1|tee 
90m_1_1.out

# grep JobId= 90m_1_1.out |grep -v "Volume Record"|wc -l
28

90m_1_1.out <excerpt>
...
Begin Job Session Record: File:blk=13:1 SessId=3 SessTime=1249401525 JobId=31
   Job=mrc6320-backup.2009-08-04_17.00.00_06 Date=04-Aug-2009 17:19:06 Level=I 
Type=B
End Job Session Record: File:blk=13:579 SessId=3 SessTime=1249401525 JobId=31
   Date=04-Aug-2009 17:24:18 Level=I Type=B Files=417 Bytes=37,303,402 Errors=0 
Status=T
05-Aug 10:16 bls JobId 0: Error: block.c:1010 Read error on fd=3 at file:blk 
14:0 on device "Drive-1" (/dev/nst0). ERR=Input/output error.
05-Aug 10:16 bls JobId 0: End of Volume at file 14 on device "Drive-1" 
(/dev/nst0), Volume "90m_1_1"
05-Aug 10:16 bls JobId 0: End of all volumes.
05-Aug 10:16 bls JobId 0: Alert: smartctl version 5.37 [i686-pc-linux-gnu] 
Copyright (C) 2002-6 Bruce Allen
05-Aug 10:16 bls JobId 0: Alert: Home page is 
http://smartmontools.sourceforge.net/
05-Aug 10:16 bls JobId 0: Alert:
05-Aug 10:16 bls JobId 0: Alert: TapeAlert Errors (C=Critical, W=Warning, 
I=Informational):
05-Aug 10:16 bls JobId 0: Alert: [0x14] C: The tape drive needs cleaning:
05-Aug 10:16 bls JobId 0: Alert:   1. If the operation has stopped, eject the 
tape and clean the drive.
05-Aug 10:16 bls JobId 0: Alert:   2. If the operation has not stopped, wait 
for it to finish and then
05-Aug 10:16 bls JobId 0: Alert:   clean the drive.
05-Aug 10:16 bls JobId 0: Alert:   Check the tape drive users manual for device 
specific cleaning instructions.
05-Aug 10:16 bls JobId 0: Alert: [0x03] W: The operation has stopped because an 
error has occurred while reading
05-Aug 10:16 bls JobId 0: Alert:   or writing data that the drive cannot 
correct.
05-Aug 10:16 bls JobId 0: Alert:
05-Aug 10:16 bls JobId 0: Alert: Error counter log:
05-Aug 10:16 bls JobId 0: Alert:            Errors Corrected by           Total 
  Correction     Gigabytes    Total
05-Aug 10:16 bls JobId 0: Alert:                ECC          rereads/    errors 
  algorithm      processed    uncorrected
05-Aug 10:16 bls JobId 0: Alert:            fast | delayed   rewrites  
corrected  invocations   [10^9 bytes]  errors
05-Aug 10:16 bls JobId 0: Alert: read:          0        0         0         1  
        0          0.000           0
05-Aug 10:16 bls JobId 0: Alert: write:         0        0         0         0  
        0          0.000           0

I did clean the drive and try this again, but I think that is a bogus alert 
based purely on the read error and is more a symptom than a cause.
I think I could pretty easily fix this by simply purging the jobs and setting 
the volume status to recycle, but I am trying to understand why this is 
happening, since everything was idle when the powerfail happened. I'm assuming 
maybe it has something to do with EOF handling, but nothing was writing so I am 
a bit confused?
This is a great backup utility so far, I am enjoying working with it. Thanks 
for any ideas/opinions.

Mike

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>