Bacula-users

Re: [Bacula-users] critical error -- tape labels get corrupted, previous backups unreadable

2012-01-24 14:10:51
Subject: Re: [Bacula-users] critical error -- tape labels get corrupted, previous backups unreadable
From: Martin Simmons <martin AT lispworks DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 24 Jan 2012 19:09:15 GMT
>>>>> On Mon, 23 Jan 2012 18:47:31 -0500, mark bergman said:
> 
> I'm experiencing a critical problem where tape labels on volumes with data
> get corrupted, leaving all data on the tape inaccessible to bacula.
> 
> I'm running bacula 5.2.2 built from source, under Linux (CentOS 5.7
> x86_64).
> 
> This problem has happened with approximately 15 tapes over approximately 6
> months, mostly new LTO-4 media, but some LTO-3 media that's being reused.
> The problem is sporadic, appearing in approximately 1 out of 60 tapes
> per week.
> 
> I do not think the issue is related to the physical media or the tape
> drives. One tape was last written successfully when in drive 0, then appears
> corrupt when a later job tries to use is in drive 1. Another tape was last
> written successfully when in drive 1, then appears corrupt when a later job
> tries to use it in drive 0.

Why do think it isn't a hardware problem?

Bacula only looks at the label when a volume is mounted, so it could be
written unsuccessfully but you wouldn't know that until later.


> Here are the log records for a particular volume. It was labeled about
> Dec 22, 2011. First used on Jan 4 2012. Used successfully for 10 jobs
> (350.49GB), then the label was corrupted.
> 
> ------------------------------
> 04-Jan 06:24 sbia-infr-vbacula JobId 42676: Using Volume "004090" from 
> 'Scratch' pool.
> 04-Jan 06:25 sbia-infr-vbacula JobId 42676: Wrote label to prelabeled Volume 
> "004090" on device "ml6000-drv1" (/dev/tape1-ml6000)
> 04-Jan 06:25 sbia-infr-vbacula JobId 42676: New volume "004090" mounted on 
> device "ml6000-drv1" (/dev/tape1-ml6000) at 04-Jan-2012 06:25.

Is /dev/tape1-ml6000 a non-rewinding device (like /dev/nst0)?


> At this point, the volume 004090 is unusable.  Running 'btape' on that volume 
> reports 
> ----------------------------
> [root@sbia-infr1 working]# ../bin/btape -v ml6000-drv0
> Tape block granularity is 1024 bytes.
> btape: butil.c:290 Using device: "ml6000-drv0" for writing.
> 23-Jan 18:14 btape JobId 0: 3301 Issuing autochanger "loaded? drive 0"
> command.
> 23-Jan 18:14 btape JobId 0: 3302 Autochanger "loaded? drive 0", result is Slot
> 9.
> btape: btape.c:477 open device "ml6000-drv0" (/dev/tape0-ml6000): OK
> *readlabel
> btape: btape.c:526 Volume has no label.
> 
> Volume Label:
> Id                : **error**VerNo             : 0
> VolName           : 
> PrevVolName       : 
> VolFile           : 0
> LabelType         : Unknown 0
> LabelSize         : 0
> PoolName          : 
> MediaType         : 
> PoolType          : 
> HostName          : 
> Date label written: -4712-01-01 at 00:00
> ----------------------------
> 
> 
> 
> 
> However, there _is_ data on the tape. I'm able to read the tape via dd
> (ibs=64k). The ASCII data at the beginning of the tape shows fragments of the
> Bacula label and data that corresponds to some of the backups:

The output of

od -tx1 /tmp/vol4090.header | head -n 40

might be useful, to see why Bacula rejects it.

__Martin

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users