Bacula-users

[Bacula-users] critical error -- tape labels get corrupted, previous backups unreadable

2012-01-23 18:49:17
Subject: [Bacula-users] critical error -- tape labels get corrupted, previous backups unreadable
From: mark.bergman AT uphs.upenn DOT edu
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 23 Jan 2012 18:47:31 -0500
I'm experiencing a critical problem where tape labels on volumes with data
get corrupted, leaving all data on the tape inaccessible to bacula.

I'm running bacula 5.2.2 built from source, under Linux (CentOS 5.7
x86_64).

This problem has happened with approximately 15 tapes over approximately 6
months, mostly new LTO-4 media, but some LTO-3 media that's being reused.
The problem is sporadic, appearing in approximately 1 out of 60 tapes
per week.

I do not think the issue is related to the physical media or the tape
drives. One tape was last written successfully when in drive 0, then appears
corrupt when a later job tries to use is in drive 1. Another tape was last
written successfully when in drive 1, then appears corrupt when a later job
tries to use it in drive 0.

I'm not sure what combination of circumstances trigger the problem,
but it seems to show up after:

        a tape is used for backups
        the tape is unloaded
        the tape is later reloaded for use in another job

Data from uncorrupted tapes can be read and restored without problems.

Purging the corrupt volume from the bacula database and relabeling media
allows me to reuse the tape successfully (at the loss of TB of backups!).

A tape will work successfully for many jobs, then later bacula will mount the
tape and be unable to read the label.

Here are the log records for a particular volume. It was labeled about
Dec 22, 2011. First used on Jan 4 2012. Used successfully for 10 jobs
(350.49GB), then the label was corrupted.

------------------------------
04-Jan 06:24 sbia-infr-vbacula JobId 42676: Using Volume "004090" from 
'Scratch' pool.
04-Jan 06:25 sbia-infr-vbacula JobId 42676: Wrote label to prelabeled Volume 
"004090" on device "ml6000-drv1" (/dev/tape1-ml6000)
04-Jan 06:25 sbia-infr-vbacula JobId 42676: New volume "004090" mounted on 
device "ml6000-drv1" (/dev/tape1-ml6000) at 04-Jan-2012 06:25.
04-Jan 08:23 sbia-infr-vbacula JobId 42676: Committing spooled data to Volume 
"004090". Despooling 37,003,975,390 bytes ...
05-Jan 06:47 sbia-infr-vbacula JobId 42724: Volume "004090" previously written, 
moving to end of data.
05-Jan 06:49 sbia-infr-vbacula JobId 42724: Ready to append to end of Volume 
"004090" at file=69.
05-Jan 11:05 sbia-infr-vbacula JobId 42724: Committing spooled data to Volume 
"004090". Despooling 495 bytes ...
06-Jan 06:51 sbia-infr-vbacula JobId 42746: Volume "004090" previously written, 
moving to end of data.
06-Jan 06:52 sbia-infr-vbacula JobId 42746: Ready to append to end of Volume 
"004090" at file=70.
06-Jan 12:08 sbia-infr-vbacula JobId 42746: Committing spooled data to Volume 
"004090". Despooling 495 bytes ...
07-Jan 06:48 sbia-infr-vbacula JobId 42768: Volume "004090" previously written, 
moving to end of data.
07-Jan 06:50 sbia-infr-vbacula JobId 42768: Ready to append to end of Volume 
"004090" at file=71.
07-Jan 12:07 sbia-infr-vbacula JobId 42768: Committing spooled data to Volume 
"004090". Despooling 495 bytes ...
08-Jan 06:48 sbia-infr-vbacula JobId 42790: Volume "004090" previously written, 
moving to end of data.
08-Jan 06:50 sbia-infr-vbacula JobId 42790: Ready to append to end of Volume 
"004090" at file=72.
08-Jan 11:40 sbia-infr-vbacula JobId 42790: Committing spooled data to Volume 
"004090". Despooling 495 bytes ...
09-Jan 11:12 sbia-infr-vbacula JobId 42812: Committing spooled data to Volume 
"004090". Despooling 495 bytes ...
10-Jan 05:47 sbia-infr-vbacula JobId 42831: Volume "004090" previously written, 
moving to end of data.
10-Jan 05:49 sbia-infr-vbacula JobId 42831: Ready to append to end of Volume 
"004090" at file=74.
10-Jan 23:52 sbia-infr-vbacula JobId 42831: Committing spooled data to Volume 
"004090". Despooling 35,276,606,182 bytes ...
11-Jan 12:58 sbia-infr-vbacula JobId 42856: Committing spooled data to Volume 
"004090". Despooling 495 bytes ...
12-Jan 06:45 sbia-infr-vbacula JobId 42879: Volume "004090" previously written, 
moving to end of data.
12-Jan 06:45 sbia-infr-vbacula JobId 42879: Ready to append to end of Volume 
"004090" at file=83.
12-Jan 12:56 sbia-infr-vbacula JobId 42879: Committing spooled data to Volume 
"004090". Despooling 5,914 bytes ...
13-Jan 06:45 sbia-infr-vbacula JobId 42901: Committing spooled data to Volume 
"004090". Despooling 404 bytes ...
15-Jan 16:54 sbia-infr-vbacula JobId 42924: Please mount Volume "004090" or 
label a new one for:
------------------------------




At this point, the volume 004090 is unusable.  Running 'btape' on that volume 
reports 
----------------------------
[root@sbia-infr1 working]# ../bin/btape -v ml6000-drv0
Tape block granularity is 1024 bytes.
btape: butil.c:290 Using device: "ml6000-drv0" for writing.
23-Jan 18:14 btape JobId 0: 3301 Issuing autochanger "loaded? drive 0"
command.
23-Jan 18:14 btape JobId 0: 3302 Autochanger "loaded? drive 0", result is Slot
9.
btape: btape.c:477 open device "ml6000-drv0" (/dev/tape0-ml6000): OK
*readlabel
btape: btape.c:526 Volume has no label.

Volume Label:
Id                : **error**VerNo             : 0
VolName           : 
PrevVolName       : 
VolFile           : 0
LabelType         : Unknown 0
LabelSize         : 0
PoolName          : 
MediaType         : 
PoolType          : 
HostName          : 
Date label written: -4712-01-01 at 00:00
----------------------------




However, there _is_ data on the tape. I'm able to read the tape via dd
(ibs=64k). The ASCII data at the beginning of the tape shows fragments of the
Bacula label and data that corresponds to some of the backups:

-----------------------------------------
strings /tmp/vol4090.header |head -20
BB02
Bacula 1.0 immortal
ge$o
Archive
Backup
archive
sbia-infr-vnfs2
archive.2012-01-13_06.45.00_01
archive
ItnIpRhN5JVFXhw/+0/sAsD
Bacula 1.0 immortal
Archive
Backup
archive
sbia-infr-vnfs2
archive.2012-01-13_06.45.00_01
archive
ItnIpRhN5JVFXhw/+0/sAsD
-----------------------------------------



I am happy to provide additional debugging information, etc.

Any assistance is welcome.

Thanks,

Mark

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users