Bacula-users

Re: [Bacula-users] Questions regarding migration job failure

2011-05-13 03:19:21
Subject: Re: [Bacula-users] Questions regarding migration job failure
From: Graham Keeling <graham AT equiinet DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Fri, 13 May 2011 08:15:38 +0100
On Thu, May 12, 2011 at 09:58:14AM -0700, Jerry Lowry wrote:
> thanks for the help.  Looks like I have some digging to do to figure out  
> what is actually happening.  I know that I one time I had some problems  
> with the raid controller.  I have since gotten that resolved.
>
> If the volume has been recycled will the corruption remain with the  
> volume or will it go by the wayside once the volume recycles?  Just  
> curious as to whether I should drop the corrupt volumes ( files ) and  
> create new ones.

The corruption will definitely remain with the volume if you don't recycle it.

Bacula truncates the volumes when it recycles them, which means that the area
of the disk on which the problem occurred is free to be used by anything.

So if the problem is to do with bad areas of disk, then it could hit you again
at any time. Therefore not truncating them could avoid the problem since the
bad space is contained in a volume that you are not going to use again.

But if the problem is because of bacula itself corrupting the volume, it could
happen again at any time anyway, so truncating them isn't going to make any
difference.

> On 5/12/2011 12:31 AM, Graham Keeling wrote:
>> On Wed, May 11, 2011 at 02:06:44PM -0700, Jerry Lowry wrote:
>>> another mistake on my part.  You have to give bls the correct spelling
>>> of the volume ( sometimes I wonder )
>>>
>>> Once I corrected the volume name this is the results I get:
>>>
>>> Volume Record: File:blk=0: 206 Sessid=16 SessTime=1303843290 Jobid=3
>>> DataLen=171
>>> 11-May 13:42 bls JobId 0: Error: block.c:318 Volumne data error at 0:206!
>>> Block checksum mismatch in block=6010112 len=64512: calc=c6a6912d
>>> blk=50a7d773
>> Well, that's the problem right there.
>> Your migration doesn't work when volumes that are not corrupted are being 
>> read.
>>
>> As to how your volumes got corrupted, that's a much harder question.
>>
>> If it were me, I would start everything from scratch, and after every backup
>> run your 'bls' command on any volume that changed. This will let you catch
>> the problem just after it happened, and you might be able to spot anything
>> strange that happened before that.
>>
>> (assuming that it is a bacula bug, rather than you having a disk or a file
>> system problem)
>>
>>> I ran this again with debug at level 200. I have attached the file with
>>> the output.
>>>
>>> thanks for all your help!
>>>
>>> On 5/11/2011 12:11 PM, Jerry Lowry wrote:
>>>> Hi,
>>>>
>>>> No, the migration job is occurring on the same storage daemon.  This
>>>> storage daemon has 6 raid devices setup as jbod, 3 are for daily use
>>>> and 3 are setup as hotswap devices for off-site backups.  The problem
>>>> is when I run bls on the storage daemon where the disks are located I
>>>> get a message asking me to mount the disk, which is already mounted
>>>> according to the director, as well as being mounted by the OS.
>>>>
>>>>
>>>>
>>>> On 5/11/2011 11:26 AM, Phil Stracchino wrote:
>>>>> On 05/11/11 13:48, Jerry Lowry wrote:
>>>>>> Ok, I am trying to run bls on the specified volume file that is
>>>>>> associated with this job. But the problem I am having is that bls is
>>>>>> failing trying to stat the device.
>>>>>>
>>>>>> I have one director and two storage directors.  The volume I am trying
>>>>>> to run against is on the second SD.  Do I run bls on the system where
>>>>>> the 'director' is or on the system thats running the stand alone 'sd'
>>>>>> where the volume is located?
>>>>> Jerry,
>>>>> If I'm understanding you correctly, you have two storage daemons, and
>>>>> you're trying to do a migration from a device on one SD to a device on
>>>>> the other.  Is this correct?
>>>>>
>>>>> If this understanding is correct, sorry, it won't work.  Copy and
>>>>> migration can currently only be done between devices controlled by the
>>>>> same SD.  (This is in large part a result of there being no current
>>>>> capability for direct communication between one storage daemon and 
>>>>> another.)
>>>>>
>>>>>
>>>> -- 
>>>>
>>>> ---------------------------------------------------------------------------
>>>> Jerold Lowry
>>>> IT Manager / Software Engineer
>>>> Engineering Design Team (EDT), Inc. a HEICO company
>>>> 1400 NW Compton Drive, Suite 315
>>>> Beaverton, Oregon 97006 (U.S.A.)
>>>> Phone: 503-690-1234 / 800-435-4320
>>>> Fax: 503-690-1243
>>>> Web: _www.edt.com<http://www.edt.com/>_
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Achieve unprecedented app performance and reliability
>>>> What every C/C++ and Fortran developer should know.
>>>> Learn how Intel has extended the reach of its next-generation tools
>>>> to help boost performance applications - inlcuding clusters.
>>>> http://p.sf.net/sfu/intel-dev2devmay
>>>>
>>>>
>>>> _______________________________________________
>>>> Bacula-users mailing list
>>>> Bacula-users AT lists.sourceforge DOT net
>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>> -- 
>>>
>>> ---------------------------------------------------------------------------
>>> Jerold Lowry
>>> IT Manager / Software Engineer
>>> Engineering Design Team (EDT), Inc. a HEICO company
>>> 1400 NW Compton Drive, Suite 315
>>> Beaverton, Oregon 97006 (U.S.A.)
>>> Phone: 503-690-1234 / 800-435-4320
>>> Fax: 503-690-1243
>>> Web: _www.edt.com<http://www.edt.com/>_
>>>
>>>
>>> [jlowry@distress-sd bin]$ ./bls -d 200 -j -v -v -V home-0006 -c 
>>> /etc/bacula/bacula-sd.conf /Home
>>> bls: stored_conf.c:698-0 Inserting director res: distress-mon
>>> bls: stored_conf.c:698-0 Inserting device res: DBB
>>> bls: stored_conf.c:698-0 Inserting device res: Hardware
>>> bls: stored_conf.c:698-0 Inserting device res: Swift
>>> bls: stored_conf.c:698-0 Inserting device res: Home
>>> bls: stored_conf.c:698-0 Inserting device res: Workstations
>>> bls: stored_conf.c:698-0 Inserting device res: TopSwap
>>> bls: stored_conf.c:698-0 Inserting device res: MidSwap
>>> bls: stored_conf.c:698-0 Inserting device res: BottomSwap
>>> bls: stored_conf.c:698-0 Inserting device res: FileStorage
>>> bls: stored_conf.c:698-0 Inserting device res: FileStorage1
>>> bls: stored_conf.c:698-0 Inserting device res: Drive-1
>>> bls: match.c:250-0 add_fname_to_include prefix=0 gzip=0 fname=/
>>> bls: butil.c:281 Using device: "/Home" for reading.
>>> bls: dev.c:284-0 init_dev: tape=0 dev_name=/Home
>>> bls: vol_mgr.c:162-0 add read_vol=home-0006 JobId=0
>>> bls: butil.c:186-0 Acquire device for read
>>> bls: acquire.c:95-0 Want Vol=home-0006 Slot=0
>>> bls: acquire.c:109-0 MediaType dcr= dev=File
>>> bls: acquire.c:189-0 dir_get_volume_info vol=home-0006
>>> bls: bls.c:486-0 Fake dir_get_volume_info
>>> bls: mount.c:546-0 Must load "Home" (/Home)
>>> bls: autochanger.c:120-0 Device "Home" (/Home) is not an autochanger
>>> bls: acquire.c:220-0 bstored: open vol=home-0006
>>> bls: dev.c:360-0 open dev: type=1 dev_name="Home" (/Home) vol=home-0006 
>>> mode=OPEN_READ_ONLY
>>> bls: dev.c:369-0 call open_file_device mode=OPEN_READ_ONLY
>>> bls: dev.c:2089-0 Enter mount
>>> bls: dev.c:542-0 open disk: mode=OPEN_READ_ONLY open(/Home/home-0006, 0x0, 
>>> 0640)
>>> bls: dev.c:557-0 open dev: disk fd=3 opened, part=0/0, part_size=0
>>> bls: dev.c:373-0 preserve=0x0 fd=3
>>> bls: acquire.c:228-0 opened dev "Home" (/Home) OK
>>> bls: acquire.c:231-0 calling read-vol-label
>>> bls: label.c:81-0 Enter read_volume_label res=0 device="Home" (/Home) 
>>> vol=home-0006 dev_Vol=*NULL*
>>> bls: label.c:130-0 Big if statement in read_volume_label
>>> bls: label.c:820-0 unser_vol_label
>>>
>>> Volume Label:
>>> Id                : Bacula 1.0 immortal
>>> VerNo             : 11
>>> VolName           : home-0006
>>> PrevVolName       :
>>> VolFile           : 0
>>> LabelType         : VOL_LABEL
>>> LabelSize         : 171
>>> PoolName          : HomePool
>>> MediaType         : File
>>> PoolType          : Backup
>>> HostName          : distress-sd
>>> Date label written: 01-May-2011 14:50
>>> bls: label.c:202-0 Compare Vol names: VolName=home-0006 hdr=home-0006
>>>
>>> Volume Label:
>>> Id                : Bacula 1.0 immortal
>>> VerNo             : 11
>>> VolName           : home-0006
>>> PrevVolName       :
>>> VolFile           : 0
>>> LabelType         : VOL_LABEL
>>> LabelSize         : 171
>>> PoolName          : HomePool
>>> MediaType         : File
>>> PoolType          : Backup
>>> HostName          : distress-sd
>>> Date label written: 01-May-2011 14:50
>>> bls: label.c:223-0 Leave read_volume_label() VOL_OK
>>> bls: label.c:236-0 Call reserve_volume=home-0006
>>> bls: vol_mgr.c:352-0 enter reserve_volume=home-0006 drive="Home" (/Home)
>>> bls: vol_mgr.c:268-0 new Vol=home-0006 at ae0bc8 dev="Home" (/Home)
>>> bls: vol_mgr.c:470-0 === set in_use. vol=home-0006 dev="Home" (/Home)
>>> bls: vol_mgr.c:211-0 List end new volume: home-0006 in_use=1 on device 
>>> "Home" (/Home)
>>> bls: acquire.c:235-0 Got correct volume.
>>> 11-May 13:54 bls JobId 0: Ready to read from volume "home-0006" on device 
>>> "Home" (/Home).
>>> bls: label.c:820-0 unser_vol_label
>>>
>>> Volume Label:
>>> Id                : Bacula 1.0 immortal
>>> VerNo             : 11
>>> VolName           : home-0006
>>> PrevVolName       :
>>> VolFile           : 0
>>> LabelType         : VOL_LABEL
>>> LabelSize         : 171
>>> PoolName          : HomePool
>>> MediaType         : File
>>> PoolType          : Backup
>>> HostName          : distress-sd
>>> Date label written: 01-May-2011 14:50
>>>
>>> Volume Label:
>>> Id                : Bacula 1.0 immortal
>>> VerNo             : 11
>>> VolName           : home-0006
>>> PrevVolName       :
>>> VolFile           : 0
>>> LabelType         : VOL_LABEL
>>> LabelSize         : 171
>>> PoolName          : HomePool
>>> MediaType         : File
>>> PoolType          : Backup
>>> HostName          : distress-sd
>>> Date label written: 01-May-2011 14:50
>>> 11-May 13:54 bls JobId 0: Error: block.c:318 Volume data error at 0:206!
>>> Block checksum mismatch in block=6010112 len=64512: calc=c6a6912d 
>>> blk=50a7d773
>>> bls: butil.c:298-0 Device status: 84
>>> bls: acquire.c:457-0 release_device device "Home" (/Home) is disk
>>> bls: acquire.c:466-0 dir_update_vol_info. label=64 Vol=home-0006
>>> bls: vol_mgr.c:179-0 remove_read_vol=home-0006 JobId=0 found=1
>>> bls: vol_mgr.c:211-0 List remove_read_volume: home-0006 in_use=1 on device 
>>> "Home" (/Home)
>>> bls: vol_mgr.c:594-0 === set not reserved vol=home-0006 num_writers=0 
>>> dev_reserved=0 dev="Home" (/Home)
>>> bls: vol_mgr.c:595-0 === clear in_use vol=home-0006
>>> bls: vol_mgr.c:623-0 === clear in_use vol=home-0006
>>> bls: vol_mgr.c:626-0 === remove volume home-0006 dev="Home" (/Home)
>>> bls: acquire.c:514-0 0 writers, 0 reserve, dev="Home" (/Home)
>>> bls: dev.c:1924-0 close_dev "Home" (/Home)
>>> bls: dev.c:2123-0 Enter unmount
>>> bls: dev.c:1913-0 Clear volhdr vol=home-0006
>>> bls: vol_mgr.c:616-0 No vol on dev "Home" (/Home)
>>> bls: acquire.c:551-0 JobId=0 broadcast wait_device_release at 11-May-2011 
>>> 13:54:55
>>> bls: acquire.c:561-0 ===== Device "Home" (/Home) released by JobId=0
>>> bls: mem_pool.c:370-0 garbage collect memory pool
>>> bls: dev.c:1924-0 close_dev "Home" (/Home)
>>> bls: dev.c:1931-0 device "Home" (/Home) already closed vol=
>>> ------------------------------------------------------------------------------
>>> Achieve unprecedented app performance and reliability
>>> What every C/C++ and Fortran developer should know.
>>> Learn how Intel has extended the reach of its next-generation tools
>>> to help boost performance applications - inlcuding clusters.
>>> http://p.sf.net/sfu/intel-dev2devmay
>>> _______________________________________________
>>> Bacula-users mailing list
>>> Bacula-users AT lists.sourceforge DOT net
>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>>
>>
>> ------------------------------------------------------------------------------
>> Achieve unprecedented app performance and reliability
>> What every C/C++ and Fortran developer should know.
>> Learn how Intel has extended the reach of its next-generation tools
>> to help boost performance applications - inlcuding clusters.
>> http://p.sf.net/sfu/intel-dev2devmay
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
> -- 
>
> ---------------------------------------------------------------------------
> Jerold Lowry
> IT Manager / Software Engineer
> Engineering Design Team (EDT), Inc. a HEICO company
> 1400 NW Compton Drive, Suite 315
> Beaverton, Oregon 97006 (U.S.A.)
> Phone: 503-690-1234 / 800-435-4320
> Fax: 503-690-1243
> Web: _www.edt.com <http://www.edt.com/>_
>
>

> ------------------------------------------------------------------------------
> Achieve unprecedented app performance and reliability
> What every C/C++ and Fortran developer should know.
> Learn how Intel has extended the reach of its next-generation tools
> to help boost performance applications - inlcuding clusters.
> http://p.sf.net/sfu/intel-dev2devmay
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users



------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users