The problem is that your setup is stretching your hardware way beyond the
limits as you've configured it.
In most cases I would say that you should configure your file sets such
that backups and restores take less than one day each. If that means you
have to break up a 50TB filesystem into 25 or even 50 jobs, then so be it.
With how much work you had it going through, it's very possible the tape
drive needed cleaning in the middle.
https://www.maxell.co.jp/e/products/industrial/care_comtape/proper/proper3.html
For most environments I suspect every 50 hrs is overkill, but bacula does
not handle tape cleaning properly yet. With a job that long you're just
asking for trouble. I'm not saying with any certainty that the cleaning
issue is what caused your problem, but if you break up that data area into
multiple file sets you may find it to be far more manageable. I presume
you've got an auto-changer with more than one drive too, so if you've got
the IO available on your storage array you may even be able to back it up
two parts at a time... (and likewise for restore).
Just curious... is an 8 day restore acceptable to management? You've
probably already spent a ton of money on hardware... time to start
optimizing it.
> From: "Steve Costaras" <stevecs AT chaven DOT com>
>
> I'm running bacula 5.0.3 under ubuntu 10.04 & lto4 tapes. Have a restore
> going that is a ~8 days long (~50TB), on the second to last day, got an I/O
> error onn one of the tapes/one of the files, instead of continuing the
> restore/skipping that file it canceled the entire restore process.
>
> two issues:
>
> 1) is there a way to continue the restore from where it left off?
>
> 2) what can I do to make sure that restores in the future do not cancel the
> job when an i/o error happens but instead just log the file(s) that are in
> error?
>
>
>
> ------------
> 2011-06-11 07loki-sd JobId 731: Ready to read from volume "AA0011" on device
> "LTO4" (/dev/nst0).
> 2011-06-11 07loki-sd JobId 731: Forward spacing Volume "AA0011" to file:block
> 0:1.
> 2011-06-11 08loki-sd JobId 731: Error: block.c:1002 Read error on fd=6 at
> file:blk 3:1185 on device "LTO4" (/dev/nst0). ERR=Input/output error.
> 2011-06-11 08loki-fd JobId 731: Error: attribs.c:423 File size of restored
> file /var/ftp/pub/Multimedia/DVD/Television/Stargate/Stargate SG1/Stargate
> SG1 (2000)-S04D5.iso not correct. Original 7763525632, restored 28570208.
> 2011-06-11 08loki-sd JobId 731: Alert: cannot open SCSI device '*None*' - No
> such file or directory
> 2011-06-11 08loki-sd JobId 731: Fatal error: fd_cmds.c:167 Command error with
> FD, hanging up. Wrong Volume mounted on device "LTO4" (/dev/nst0): Wanted
> AA0011 have AA0010
>
> 2011-06-11 08loki-dir JobId 731: Error: Bacula loki-dir 5.0.3 (04Aug10):
> 11-Jun-2011 08:02:25
> Build OS: x86_64-unknown-linux-gnu ubuntu 10.04
> JobId: 731
> Job: RestoreFiles.2011-06-05_21.26.08_04
> Restore Client: loki-fd
> Start time: 05-Jun-2011 21:26:10
> End time: 11-Jun-2011 08:02:25
> Files Expected: 3,146,656
> Files Restored: 48,369
> Bytes Restored: 32,321,658,846,063
> Rate: 68743.9 KB/s
> FD Errors: 1
> FD termination status: Error
> SD termination status: Error
> Termination: *** Restore Error ***
>
> 2011-06-11 08loki-dir JobId 731: Begin pruning Jobs older than 1 year 25 days
> .
> 2011-06-11 08loki-dir JobId 731: No Jobs found to prune.
> 2011-06-11 08loki-dir JobId 731: Begin pruning Jobs.
> 2011-06-11 08loki-dir JobId 731: No Files found to prune.
> 2011-06-11 08loki-dir JobId 731: End auto prune.
>
> 2011-06-11 08loki-dir JobId 732: shell command: run BeforeJob
> "/opt/bacula/etc/make_catalog_backup bacula bacula"
> 2011-06-11 08loki-dir JobId 732: Start Backup JobId 732,
> Job=loki-BackupCatalog.2011-06-05_23.10.00_05
> 2011-06-11 08loki-dir JobId 732: Using Device "LTO4"
> 2011-06-11 08loki-sd JobId 732: Error: block.c:1002 Read error on fd=6 at
> file:blk 0:0 on device "LTO4" (/dev/nst0). ERR=Input/output error.
> 2011-06-11 08loki-sd JobId 732: Please mount Volume "DD0004" or label a new
> one for:
> Job: loki-BackupCatalog.2011-06-05_23.10.00_05
> Storage: "LTO4" (/dev/nst0)
> Pool: BackupSetDD
> Media type: LTO4
> 2011-06-11 08loki-sd JobId 732: Error: block.c:1002 Read error on fd=6 at
> file:blk 0:0 on device "LTO4" (/dev/nst0). ERR=Input/output error.
> -------
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|