Bacula-users

Re: [Bacula-users] [Bacula-devel] bconsole restore bug - option 12.

2010-01-18 05:56:28
Subject: Re: [Bacula-users] [Bacula-devel] bconsole restore bug - option 12.
From: Kern Sibbald <kern AT sibbald DOT com>
To: bacula-devel AT lists.sourceforge DOT net
Date: Mon, 18 Jan 2010 11:53:24 +0100
Hello Graham,

I have spent some time reviewing this bug.  Thanks for the database and conf 
file, I was able to easily reproduce what you saw.

It seems to me that there are three problems here -- unfortunately all 
possibly quite serious:

1. The Volume is corrupt.
2. When Bacula reads the corrupt data, it smashes its stack
3. The FileIndex records are not sequential.

1. Do you have any idea how the Volume got corrupted?  

It looks like the bad data is associated with the JobSession records (when a 
job starts, Bacula writes a label to the tape indicating the beginning of a 
job).  

Have you used a modified SD in producing this Volume?  

If so, then there is a bug in the code.  If not, I would like to learn more 
about the history of this Volume.

2. Bacula trashes its stack when it encounters the bad Session records.  
Unfortunately, the serial code used to write and extract lables is very old 
and didn't properly protect itself from bad data on the Volume.  I have now 
modified the current source code to fix this problem.  With the fix, Bacula 
reads through the whole Volume, and does not crash.  I have committed this to 
the master branch on Source Forge.

3. The FileIndex records in the Attribute records do not correspond to the 
record sequence numbers.  This is what caused Bacula to fail the bls (the -p 
option allows it to continue).  I haven't looked at this yet, but will start 
looking at the VirtualFull code to see if it was an oversight on my part.  If 
you know more about how the Volume was created and what kind of Jobs are on 
it, please let me know as it may help get to the bottom of the problem.

The problem with FileIndex records out of order is that restore by file will 
not work correctly, even a full restore may not get all the records.  The 
records *can* be extracted but to get them all might require editing the bsr 
file or using bextract without any bsr ...  This is not good.  

There is a bug report open on restore problems related to VirtualFull jobs, so 
possibly this is related.  I will look into that.

Could you give me the exact commands that caused this problem in the 
beginning?  I.e. you refer to restore bug -- option 12.  I would like to see 
what you wanted to do with option 12.  The more info I have the easier it 
will be to find and fix the problem.  Many thanks.

I will report back when I have more info ...

Best regards,

Kern


On Wednesday 18 November 2009 10:09:19 Graham Keeling wrote:
> Somebody on the bacula-users list is having a similar problem whilst trying
> to run a verify job on a VirtualFull.
> Here is his message:
>
> On Fri, Nov 13, 2009 at 09:02:11AM +0100, Fahrer, Julian wrote:
> > I am currently trying to implement verify jobs at a customer's site.
> > At that site I am running fulls and incrementals to disk and virtual
> > fulls to tape. I want to verify that the data on tape is ok. So tried a
> > verify job after the virtual full has finished. But instead of using the
> > virtual full (which is the last backup for that job) the last backup to
> > disk is choosen.
> >
> >
> > Here is an Example:
> > 13-Nov 08:23 backup01_dir JobId 3039: Verifying against JobId=3017
> > Job=server2_KHK.2009-11-12_21.00.00_06 13-Nov 08:23 backup01_dir JobId
> > 3039: Bootstrap records written to
> > /var/bacula/working/backup01_dir.restore.1.bsr 13-Nov 08:23 backup01_dir
> > JobId 3039: Start Verify JobId=3039 Level=VolumeToCatalog
> > Job=server2_KHK_verify.2009-11-13_08.23.43_03 13-Nov 08:23 backup01_dir
> > JobId 3039: Using Device "LTO2"
> > 13-Nov 08:23 backup01_sd JobId 3039: acquire.c:116 Changing read device.
> > Want Media Type="File" have="LTO2" device="LTO2" (/dev/nst0)
> > 13-Nov 08:23 backup01_sd JobId 3039: Media Type change.  New read device
> > "FileStorage_data2" (/data2/b2d_2) chosen. 13-Nov 08:23 backup01_sd JobId
> > 3039: Ready to read from volume "KHK_0030" on device "FileStorage_data2"
> > (/data2/b2d_2).
> >
> > Also there is another Job:
> > | 3,017 | server2_KHK          | 2009-11-12 21:00:01 | B    | F     |  
> > | 19,243 |  14,399,178,034 | T         | 3,033 | server2_KHK          |
> > | 2009-11-12 21:00:01 | B    | F     |   19,243 |  14,402,307,154 | T    
> > |     |
> >
> > Jobid 3033 is the Virtual Full. Can the same value in the date column
> > cause this problem?
>
> And, since it has been a couple of months since this thread died, here is a
> reminder:
>
> On Mon, Sep 14, 2009 at 05:19:44PM +0100, Graham Keeling wrote:
> > I think I have found a bug that is very easy to reproduce.
> >
> > I do a full backup (JobId 1).
> > I do a virtualfull backup (JobId 2).
> >
> > I then go to bconsole, and type
> > 'restore'
> > I select option '12: Select full restore to a specified JobId'.
> >
> > I type '2', to get JobId '2', but bacula selects JobId '1' instead:
> > > Select item:  (1-13): 12
> > > Enter JobId to restore: 2
> > > You have selected the following JobId: 1
> >
> > When I do it again from the start, but type in JobId '1', it gets it
> > right.
>
> On Tue, Sep 15, 2009 at 02:59:22PM +0100, Graham Keeling wrote:
> > On Tue, Sep 15, 2009 at 10:03:57AM +0200, Eric Bollengier wrote:
> > > Le Tuesday 15 September 2009 09:57:48 Graham Keeling, vous avez écrit :
> > > > On Tue, Sep 15, 2009 at 09:49:38AM +0200, Eric Bollengier wrote:
> > > > > Le Tuesday 15 September 2009 09:41:07 Graham Keeling, vous avez 
écrit :
> > > > > > On Mon, Sep 14, 2009 at 06:44:25PM +0200, Eric Bollengier wrote:
> > > > > > > Perhaps we should avoid to make a VirtualFull when we have only
> > > > > > > one Full backup in the job list or mark the new job as a Copy.
> > > > > >
> > > > > > The same thing happens whenever you make the VirtualFull, after
> > > > > > Fulls, Differentials or Incrementals.
> > > > > > I was keeping my example as simple as possible.
> > > > >
> > > > > A bit strange, it works very well here :
> > > > >
> > > > > REGRESS_DEBUG=1 ./tests/virtual-backup-test
> > > > > $ ./bin/bacula start
> > > > > $ ./bin/bconsole
> > > > > * list jobs
> > > > > +---+---------+---------------------+------+-------+----------+----
> > > > >----+
> > > > >
> > > > > |id | name    | starttime           | type | level | jobfiles |
> > > > > | status |
> > > > >
> > > > > +---+---------+---------------------+------+-------+----------+----
> > > > >----+
> > > > >
> > > > > | 1 | Vbackup | 2009-09-15 09:44:59 | B    | F     |    1,570 | T  
> > > > > |    | 2 | Vbackup | 2009-09-15 09:45:03 | B    | I     |       43
> > > > > | | T      | 3 | Vbackup | 2009-09-15 09:45:07 | B    | D     |    
> > > > > |   86 | T      | 4 | Vbackup | 2009-09-15 09:45:11 | B    | I    
> > > > > | |       44 | T      | 6 | Vbackup | 2009-09-15 09:45:11 | B    |
> > > > > | F     |    1,570 | T      |
> > > > >
> > > > > * restore
> > > > > 12
> > > > > 6
> > > > >
> > > > > => Building directory tree for JobId(s) 6 ...
> > > >
> > > > What if you try selecting id 4 instead?
> > >
> > > Instead of selecting 1+3+4 it selects 6 (which is the sum of 1+3+4).
> > >
> > > Not a big deal for me, the item 12 uses a JobId to determine a specific
> > > date (instead of entering it by hand), and this date is used to select
> > > the best backup. If you choose option 6 with 2009-09-15 09:45:11, it
> > > will do the same (select jobid 6).
> > >
> > > Now, we can change the item 12 label to reflect that, no problem, any
> > > ideas is welcome.
> >
> > Perhaps I don't understand the reason for needing to do it all by dates.
> >
> > Personally, I think that the database should have a single field in the
> > Job table that, when you run a Differential or Incremental, records the
> > immediately prior JobId that it depends upon.
> > As far as I can see it, that would solve many problems - it won't all go
> > wrong if the director's clock changes, option 12 would select the JobId
> > that you told it to select, any missing jobs in a sequence will not be
> > unknowingly missed out, and so on.
>
> ---------------------------------------------------------------------------
>--- Let Crystal Reports handle the reporting - Free Crystal Reports 2008
> 30-Day trial. Simplify your report design, integration and deployment - and
> focus on what you do best, core application coding. Discover what's new
> with Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Bacula-devel mailing list
> Bacula-devel AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-devel



------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • Re: [Bacula-users] [Bacula-devel] bconsole restore bug - option 12., Kern Sibbald <=