Bacula-users

Re: [Bacula-users] verify job with differences doesn't finish and blocks storage

2009-02-18 03:55:53
Subject: Re: [Bacula-users] verify job with differences doesn't finish and blocks storage
From: "Ralf Gross" <Ralf-Lists AT ralfgross DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 18 Feb 2009 09:36:23 +0100 (CET)
Ralf Gross said:
> lately I've seen that verify jobs that have differences just doesn't
> finish.
>
> bacula 2.4.4-b1, psql
>
> *st dir
>
> [...]
>
> Running Jobs:
>  JobId Level   Name                       Status
> ======================================================================
>   9602 VolumeT
> VerifyVU0EF005-Absicherung-MPC-Volume2.2009-02-15_11.05.43.05 is running
>   9644 VolumeT  VerifyVU0EM003.2009-02-17_07.06.00.27 has verify
> differences
>   9652 Full    VU0EM003-FBR.2009-02-17_13.15.54.51 is running
> ====
>
> [...]
>
> *st client=VU0EM003
>
> VU0EM003 Version: 2.2.8 (26 January 2008)  x86_64-pc-linux-gnu debian 4.0
> Daemon started 03-Feb-09 11:39, 33 Jobs run since started.
>  Heap: heap=1,679,360 smbytes=311,531 max_bytes=464,196 bufs=193
> max_bufs=362
>  Sizeof: boffset_t=8 size_t=8 debug=0 trace=0
>
> Running Jobs:
> JobId 9644 Job VerifyVU0EM003.2009-02-17_07.06.00.27 is running.
>     Verify Job started: 17-Feb-09 07:06
>     Files=105,275 Bytes=0 Bytes/sec=0 Errors=0
>     Files Examined=105,275
>     Processing file: /......long path.....
>     SDReadSeqNo=2844194 fd=7
>
> [...]
>
> The job status doesn't change (Files Examined).
>
> * st stor
>
> [...]
>
> Running Jobs:
> Reading: Verify Volume to Catalog Restore job VerifyVU0EM003.2009-02-17_07
> JobId=9644 Volume="vu0em003-inc-0470"
>     pool="VU0EM003-Disk-Incremental" device="VU0EM003-DISK"
> (/data/bacula-storage/vu0em003)
>
> [...]
>
> Used Volume status:
> 06D142L3 on device "LTO3" (/dev/ULTRIUM-TD3)
>     Reader=0 writers=0 devres=0 volinuse=0
> vu0em003-inc-0470 on device "VU0EM003-DISK"
> (/data/bacula-storage/vu0em003)
>     Reader=1 writers=0 devres=0 volinuse=1
> ====
>
> [...]
>
>
> The last thing I see in the log file is
>
> 17-Feb 07:23 VUMEM004-dir JobId 9644: New file: .....long path....
>
>
> So, no activity since 7 hours.
>
> This is starting to be annoying because the volumes are then locked until
> I cancel thee job.


24h later I tried to cancel the still running and blocking verify job.

Running Jobs:
 JobId Level   Name                       Status
======================================================================
  9644 VolumeT  VerifyVU0EM003.2009-02-17_07.06.00.27 has been canceled
  9652 Full    VU0EM003-FBR.2009-02-17_13.15.54.51 is running
  9661 Increme  VU0EM003.2009-02-18_00.06.00.32 is waiting on max Client jobs
  9671 VolumeT  VerifyVU0EM003.2009-02-18_07.06.00.58 is waiting on max
Client jobs
====


*cancel jobid=9644
2001 Job VerifyVU0EM003.2009-02-17_07.06.00.27 marked to be canceled.



And then the console hangs and just doesn't return to the prompt.

It seems that the only way to get thing going again is to restart the dir
or maybe the fd where the verify job is running. But there is an other job
with several TB to backup running right now.

Any idea how to resolve this deadlock?

Ralf


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>