Bacula-users

Re: [Bacula-users] Long running restore canceled by tape error? Any way to continue?

2011-06-13 16:33:39
Subject: Re: [Bacula-users] Long running restore canceled by tape error? Any way to continue?
From: Bob Hetzel <beh AT case DOT edu>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 13 Jun 2011 16:30:09 -0400
The problem is that your setup is stretching your hardware way beyond the 
limits as you've configured it.

In most cases I would say that you should configure your file sets such 
that backups and restores take less than one day each.  If that means you 
have to break up a 50TB filesystem into 25 or even 50 jobs, then so be it. 
  With how much work you had it going through, it's very possible the tape 
drive needed cleaning in the middle.

https://www.maxell.co.jp/e/products/industrial/care_comtape/proper/proper3.html

For most environments I suspect every 50 hrs is overkill, but bacula does 
not handle tape cleaning properly yet.  With a job that long you're just 
asking for trouble.  I'm not saying with any certainty that the cleaning 
issue is what caused your problem, but if you break up that data area into 
multiple file sets you may find it to be far more manageable.  I presume 
you've got an auto-changer with more than one drive too, so if you've got 
the IO available on your storage array you may even be able to back it up 
two parts at a time... (and likewise for restore).

Just curious... is an 8 day restore acceptable to management?  You've 
probably already spent a ton of money on hardware... time to start 
optimizing it.

> From: "Steve Costaras" <stevecs AT chaven DOT com>
>
> I'm running bacula 5.0.3 under ubuntu 10.04 & lto4 tapes. Have a restore 
> going that is a ~8 days long (~50TB), on the second to last day, got an I/O 
> error onn one of the tapes/one of the files, instead of continuing the 
> restore/skipping that file it canceled the entire restore process.
>
> two issues:
>
> 1) is there a way to continue the restore from where it left off?
>
> 2) what can I do to make sure that restores in the future do not cancel the 
> job when an i/o error happens but instead just log the file(s) that are in 
> error?
>
>
>
> ------------
> 2011-06-11 07loki-sd JobId 731: Ready to read from volume "AA0011" on device 
> "LTO4" (/dev/nst0).
> 2011-06-11 07loki-sd JobId 731: Forward spacing Volume "AA0011" to file:block 
> 0:1.
> 2011-06-11 08loki-sd JobId 731: Error: block.c:1002 Read error on fd=6 at 
> file:blk 3:1185 on device "LTO4" (/dev/nst0). ERR=Input/output error.
> 2011-06-11 08loki-fd JobId 731: Error: attribs.c:423 File size of restored 
> file /var/ftp/pub/Multimedia/DVD/Television/Stargate/Stargate SG1/Stargate 
> SG1 (2000)-S04D5.iso not correct. Original 7763525632, restored 28570208.
> 2011-06-11 08loki-sd JobId 731: Alert: cannot open SCSI device '*None*' - No 
> such file or directory
> 2011-06-11 08loki-sd JobId 731: Fatal error: fd_cmds.c:167 Command error with 
> FD, hanging up. Wrong Volume mounted on device "LTO4" (/dev/nst0): Wanted 
> AA0011 have AA0010
>
> 2011-06-11 08loki-dir JobId 731: Error: Bacula loki-dir 5.0.3 (04Aug10): 
> 11-Jun-2011 08:02:25
>  Build OS: x86_64-unknown-linux-gnu ubuntu 10.04
>  JobId: 731
>  Job: RestoreFiles.2011-06-05_21.26.08_04
>  Restore Client: loki-fd
>  Start time: 05-Jun-2011 21:26:10
>  End time: 11-Jun-2011 08:02:25
>  Files Expected: 3,146,656
>  Files Restored: 48,369
>  Bytes Restored: 32,321,658,846,063
>  Rate: 68743.9 KB/s
>  FD Errors: 1
>  FD termination status: Error
>  SD termination status: Error
>  Termination: *** Restore Error ***
>
> 2011-06-11 08loki-dir JobId 731: Begin pruning Jobs older than 1 year 25 days 
> .
> 2011-06-11 08loki-dir JobId 731: No Jobs found to prune.
> 2011-06-11 08loki-dir JobId 731: Begin pruning Jobs.
> 2011-06-11 08loki-dir JobId 731: No Files found to prune.
> 2011-06-11 08loki-dir JobId 731: End auto prune.
>
> 2011-06-11 08loki-dir JobId 732: shell command: run BeforeJob 
> "/opt/bacula/etc/make_catalog_backup bacula bacula"
> 2011-06-11 08loki-dir JobId 732: Start Backup JobId 732, 
> Job=loki-BackupCatalog.2011-06-05_23.10.00_05
> 2011-06-11 08loki-dir JobId 732: Using Device "LTO4"
> 2011-06-11 08loki-sd JobId 732: Error: block.c:1002 Read error on fd=6 at 
> file:blk 0:0 on device "LTO4" (/dev/nst0). ERR=Input/output error.
> 2011-06-11 08loki-sd JobId 732: Please mount Volume "DD0004" or label a new 
> one for:
>  Job: loki-BackupCatalog.2011-06-05_23.10.00_05
>  Storage: "LTO4" (/dev/nst0)
>  Pool: BackupSetDD
>  Media type: LTO4
> 2011-06-11 08loki-sd JobId 732: Error: block.c:1002 Read error on fd=6 at 
> file:blk 0:0 on device "LTO4" (/dev/nst0). ERR=Input/output error.
> -------


------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>