Bacula-users

[Bacula-users] Strange error with Files on Volume / in Catalog after upgrade from 2.4.4 to 3.0.0

2009-05-04 07:15:47
Subject: [Bacula-users] Strange error with Files on Volume / in Catalog after upgrade from 2.4.4 to 3.0.0
From: Christian Gaul <christian.gaul AT otop DOT de>
To: bacula-users AT lists.sourceforge DOT net
Date: Mon, 04 May 2009 12:51:55 +0200
Hello List,

since upgrading one of my Directors and the connected SD to 3.0.0 last
week, i've noticed that every couple of writes to tape, the volume will
get marked in error because of "Number of Files mismatch".

The same director writing to the same SD on a disk device works without
problems, so i'm suspecting a change to one of the default values of the
device resource maybe? To me it looks like the File count in the MySQL
does not get incremented after completing the write job.

I'm a little bit stumped because this configuration has been running
well for about a year now, over the weekend i used bscan on all of my
disk and tape volumes to build a new catalog, since i suspected the
upgrade script (even though very simple) might have broken something so
the increment didnt work anymore. Since i now have a new Catalog and it
is still doing the same thing, here i am asking if any of you know this
behavior.

Next i will be trying the btape testsuite again, seeing if something
changed which i didnt see in the changelog that will force me to use
some kind of twoeof or somethingorother option.

Bacula 3.0.0 running on Gentoo, Mysql running on a different host (Xeon
2.4ghz x2, hardware raid5 on 3x 10k disks), autochanger is a TL2000 with
1 Drive, 24 Slots via SAS.

Relevant sections of my SD config:

Storage {
  Name = bacula-sd
  SDPort = 9103         
  WorkingDirectory = "/var/lib/bacula"
  Pid Directory = "/var/run"
  Maximum Concurrent Jobs = 20
}

Autochanger {
        Name = "TapeStorage"
        Device = "LTO3"
        Changer Device = "/dev/Changer"
        Changer Command = "/usr/libexec/bacula/mtx-changer %c %o %S %a %d"
}

Device {
        Name = "LTO3"
        Drive Index = 0
        Autochanger = yes
        Media Type = "Tape"
        Archive Device = "/dev/LTO3"
        AutomaticMount = yes
        Always Open = yes
        Removable Media = yes
        Random Access = no
        LabelMedia = yes;
}


Dir config:

Director {                            # define myself
  Name = bacula-dir
  DIRport = 9101                # where we listen for UA connections
  QueryFile = "/usr/libexec/bacula/query.sql"
  WorkingDirectory = "/var/lib/bacula"
  PidDirectory = "/var/run"
  Maximum Concurrent Jobs = 20
  Password = "foo"         # Console password
  Messages = Daemon
}



Storage {
  Name = "TapeStorage"
  Address = bacula.foo.example.com
  SDPort = 9103
  Password = "foo"
  Device = "TapeStorage"
  Media Type = Tape
  Maximum Concurrent Jobs = 4
  Autochanger = yes
}

Snips from logs:

04-May 11:31 bacula-sd JobId 16310: Job
TorHomeLong.2009-05-04_11.20.00_35 waiting. Cannot find any appendable
volumes.
Please use the "label"  command to create a new Volume for:
    Storage:      "LTO3" (/dev/LTO3)
    Pool:         Monthly
    Media type:   Tape
*update
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
Update choice:
     1: Volume parameters
     2: Pool from resource
     3: Slots from autochanger
     4: Long term statistics
Choose catalog item to update (1-4): 1
Parameters to modify:
     1: Volume Status
     2: Volume Retention Period
     3: Volume Use Duration
     4: Maximum Volume Jobs
     5: Maximum Volume Files
     6: Maximum Volume Bytes
     7: Recycle Flag
     8: Slot
     9: InChanger Flag
    10: Volume Files
    11: Pool
    12: Volume from Pool
    13: All Volumes from Pool
    14: All Volumes from all Pools
    15: Enabled
    16: RecyclePool
    17: Done
Select parameter to modify (1-17): 10
Defined Pools:
     1: FileStorage
     2: FileStorageFulls
     3: Daily
     4: Weekly
     5: Monthly
     6: Foo
     7: Oneshot
     8: LTO3
     9: OffsiteTape
Select the Pool (1-9): 5
+---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
| MediaId | VolumeName | VolStatus | Enabled | VolBytes        |
VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType |
LastWritten         |
+---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
|      70 | 000055LT   | Error     |       1 | 163,995,890,386 |   
2,901 |  315,360,000 |       0 |   20 |         1 | Tape      |
2009-05-04 11:27:04 |
|      71 | 000056LT   | Error     |       1 |       6,580,224 |       
0 |  315,360,000 |       0 |   16 |         1 | Tape      | 2009-05-04
11:30:20 |
+---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
Enter *MediaId or Volume name: 000055LT
Updating Volume "000055LT"
Warning changing Volume Files can result
in loss of data on your Volume

Current Volume Files is: 2901
Enter new number of Files for Volume: 2902
New Volume Files is: 2902
Parameters to modify:
     1: Volume Status
     2: Volume Retention Period
     3: Volume Use Duration
     4: Maximum Volume Jobs
     5: Maximum Volume Files
     6: Maximum Volume Bytes
     7: Recycle Flag
     8: Slot
     9: InChanger Flag
    10: Volume Files
    11: Pool
    12: Volume from Pool
    13: All Volumes from Pool
    14: All Volumes from all Pools
    15: Enabled
    16: RecyclePool
    17: Done
Select parameter to modify (1-17): 1
Defined Pools:
     1: FileStorage
     2: FileStorageFulls
     3: Daily
     4: Weekly
     5: Monthly
     6: Foo
     7: Oneshot
     8: LTO3
     9: OffsiteTape
Select the Pool (1-9): 5
+---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
| MediaId | VolumeName | VolStatus | Enabled | VolBytes        |
VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType |
LastWritten         |
+---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
|      70 | 000055LT   | Error     |       1 | 163,995,890,386 |   
2,902 |  315,360,000 |       0 |   20 |         1 | Tape      |
2009-05-04 11:27:04 |
|      71 | 000056LT   | Error     |       1 |       6,580,224 |       
0 |  315,360,000 |       0 |   16 |         1 | Tape      | 2009-05-04
11:30:20 |
+---------+------------+-----------+---------+-----------------+----------+--------------+---------+------+-----------+-----------+---------------------+
Enter *MediaId or Volume name: 000055LT
Updating Volume "000055LT"
Current Volume status is: Error
Possible Values are:
     1: Append
     2: Archive
     3: Disabled
     4: Full
     5: Used
     6: Cleaning
     7: Read-Only
Choose new Volume Status (1-7): 1
New Volume status is: Append
Parameters to modify:
     1: Volume Status
     2: Volume Retention Period
     3: Volume Use Duration
     4: Maximum Volume Jobs
     5: Maximum Volume Files
     6: Maximum Volume Bytes
     7: Recycle Flag
     8: Slot
     9: InChanger Flag
    10: Volume Files
    11: Pool
    12: Volume from Pool
    13: All Volumes from Pool
    14: All Volumes from all Pools
    15: Enabled
    16: RecyclePool
    17: Done
Selection terminated.

You have messages.
*m
04-May 11:35 bacula-sd JobId 16310: Ready to append to end of Volume
"000055LT" at file=2902.
04-May 11:35 bacula-sd JobId 16310: Spooling data ...
*
04-May 11:35 bacula-sd JobId 16307: Volume "000055LT" previously
written, moving to end of data.
*
04-May 11:36 bacula-sd JobId 16310: Job write elapsed time = 00:00:53,
Transfer rate = 3.166 M bytes/second
04-May 11:36 bacula-sd JobId 16310: Committing spooled data to Volume
"000055LT". Despooling 168,029,416 bytes ...
04-May 11:36 bacula-sd JobId 16307: Ready to append to end of Volume
"000055LT" at file=2902.
04-May 11:35 bacula-sd JobId 16310: Spooling data ...
*
04-May 11:35 bacula-sd JobId 16307: Volume "000055LT" previously
written, moving to end of data.
*
04-May 11:36 bacula-sd JobId 16310: Job write elapsed time = 00:00:53,
Transfer rate = 3.166 M bytes/second
04-May 11:36 bacula-sd JobId 16310: Committing spooled data to Volume
"000055LT". Despooling 168,029,416 bytes ...
04-May 11:36 bacula-sd JobId 16307: Ready to append to end of Volume
"000055LT" at file=2902.
04-May 11:36 bacula-dir JobId 16309: Using Device "LTO3"
04-May 11:36 bacula-sd JobId 16307: Spooling data ...
04-May 11:36 bacula-sd JobId 16310: Despooling elapsed time = 00:00:09,
Transfer rate = 18.66 M bytes/second
04-May 11:36 bacula-sd JobId 16310: Sending spooled attrs to the
Director. Despooling 126,399 bytes ...
04-May 11:36 bacula-dir JobId 16310: Bacula bacula-dir 3.0.0 (06Apr09):
04-May-2009 11:36:34
  Build OS:               x86_64-pc-linux-gnu gentoo
  JobId:                  16310
  Job:                    TorHomeLong.2009-05-04_11.20.00_35
  Backup Level:           Incremental, since=2009-04-29 18:35:48
  Client:                 "tor.example.com" 2.4.1 (07Jul08)
i686-pc-linux-gnu,gentoo,
  FileSet:                "LinuxHomeLong" 2009-04-30 12:48:10
  Pool:                   "Monthly" (From Run pool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "TapeStorage" (From Pool resource)
  Scheduled time:         04-May-2009 11:20:00
  Start time:             04-May-2009 11:30:20
  End time:               04-May-2009 11:36:34
  Elapsed time:           6 mins 14 secs
  Priority:               10
  FD Files Written:       397
  SD Files Written:       397
  FD Bytes Written:       167,777,131 (167.7 MB)
  SD Bytes Written:       167,838,420 (167.8 MB)
  Rate:                   448.6 KB/s
  Software Compression:   43.1 %
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         000055LT
  Volume Session Id:      10
  Volume Session Time:    1241426887
  Last Volume Bytes:      164,163,944,146 (164.1 GB)
  Non-fatal FD errors:    0
  SD Errors:              1
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK -- with warnings

04-May 11:36 bacula-dir JobId 16310: Begin pruning Jobs.
04-May 11:36 bacula-dir JobId 16310: Pruned 1 Job for client
tor.example.com from catalog.
04-May 11:36 bacula-dir JobId 16310: Begin pruning Files.
04-May 11:36 bacula-dir JobId 16310: No Files found to prune.
04-May 11:36 bacula-dir JobId 16310: End auto prune.

*
04-May 11:37 bacula-sd JobId 16309: Volume "000055LT" previously
written, moving to end of data.
*
04-May 11:37 bacula-sd JobId 16307: Job write elapsed time = 00:00:53,
Transfer rate = 958.5 K bytes/second
04-May 11:37 bacula-sd JobId 16307: Committing spooled data to Volume
"000055LT". Despooling 50,881,309 bytes ...
04-May 11:37 bacula-sd JobId 16309: Error: Bacula cannot write on tape
Volume "000055LT" because:
The number of files mismatch! Volume=2903 Catalog=2902
04-May 11:37 bacula-sd JobId 16309: Marking Volume "000055LT" in Error
in Catalog.
04-May 11:37 bacula-sd JobId 16309: 3307 Issuing autochanger "unload
slot 20, drive 0" command.
*
04-May 11:37 bacula-sd JobId 16302: Job write elapsed time = 00:07:39,
Transfer rate = 1.024 M bytes/second
04-May 11:37 bacula-sd JobId 16302: Committing spooled data to Volume
"000056LT". Despooling 471,040,760 bytes ...
*
04-May 11:38 bacula-sd JobId 16309: Job
OTS-MySQLLong.2009-05-04_11.20.00_34 waiting. Cannot find any appendable
volumes.
Please use the "label"  command to create a new Volume for:
    Storage:      "LTO3" (/dev/LTO3)
    Pool:         Monthly
    Media type:   Tape
*

--------------------------------------------------------------------------------------

So in this example output, the tape 55LT was put back to a usable state
manually, used ok, marked in error, the next tape (new, empty and
labeled with 3.0.0) was then used ok, and the next job marked it in
error too...

I'm off to start the btape testsuite on another empty tape, but i hope
someone has an idea. I would really like accurate backups and the diff /
full intervals of the scheduler.. I've been looking forward to this
release since it was first announced some time last year, thanks for the
hard work.. 2 more features and it will be the perfect backup solution
for me (resuming on connection interruption and client initiated backups...)

Thanks for your time




P.S.:

Portage 2.1.6.7 (default/linux/amd64/2008.0, gcc-4.1.2,
glibc-2.8_p20080602-r1, 2.6.27-gentoo-r8 x86_64)
=================================================================
System uname:
Linux-2.6.27-gentoo-r8-x86_64-Intel-R-_Xeon-R-_CPU_E3110_@_3.00GHz-with-glibc2.2.5
Timestamp of tree: Mon, 04 May 2009 00:15:01 +0000
app-shells/bash:     3.2_p39
dev-lang/python:     2.4.4-r14, 2.5.2-r7
dev-python/pycrypto: 2.0.1-r8
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.63
sys-devel/automake:  1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.27-r2
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d
/etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo
/etc/udev/rules.d"
CXXFLAGS="-O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks fixpackages nostrip parallel-fetch protect-owned
sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="foo"
LDFLAGS="-Wl,-O1"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times
--compress --force --whole-file --delete --stats --timeout=180
--exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync/gentoo-portage"
USE="acl acpi amd64 bacula-console berkdb bzip2 cracklib crypt gdbm gpm
iconv logrotate mmx mudflap multilib mysql ncurses nls nptl nptlonly pam
pam_timestamp pcre perl python readline sasl sse sse2 ssl sysfs tcpd
unicode vim xfs xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp
atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968
fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx
via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare
dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter
mmap_emul mulaw multi null plug rate route share shm softvol"
APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon
authn_dbm authn_default authn_file authz_dbm authz_default
authz_groupfile authz_host authz_owner authz_user autoindex cache dav
dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache
filter headers include info log_config logio mem_cache mime mime_magic
negotiation rewrite setenvif speling status unique_id userdir usertrack
vhost_alias" ELIBC="glibc" INPUT_DEVICES="void" KERNEL="linux"
LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb
ncurses text" USERLAND="GNU" VIDEO_CARDS="dummy"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK,
LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS,
PORTAGE_RSYNC_EXTRA_OPTS



ldd /usr/sbin/bacula-sd
        linux-vdso.so.1 =>  (0x00007fff9c5fe000)
        libacl.so.1 => /lib/libacl.so.1 (0x00007f8994167000)
        libattr.so.1 => /lib/libattr.so.1 (0x00007f8993f61000)
        libz.so.1 => /lib/libz.so.1 (0x00007f8993d4c000)
        libbacpy.so.1 => /usr/lib/libbacpy.so.1 (0x00007f8993a23000)
        libbaccfg.so.1 => /usr/lib/libbaccfg.so.1 (0x00007f899381b000)
        libbac.so.1 => /usr/lib/libbac.so.1 (0x00007f89935c9000)
        libutil.so.1 => /lib/libutil.so.1 (0x00007f89933c6000)
        librt.so.1 => /lib/librt.so.1 (0x00007f89931bd000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00007f8992fa2000)
        libdl.so.2 => /lib/libdl.so.2 (0x00007f8992d9e000)
        libwrap.so.0 => /lib/libwrap.so.0 (0x00007f8992b95000)
        libssl.so.0.9.8 => /usr/lib/libssl.so.0.9.8 (0x00007f8992946000)
        libcrypto.so.0.9.8 => /usr/lib/libcrypto.so.0.9.8
(0x00007f89925c0000)
        libstdc++.so.6 =>
/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.2/libstdc++.so.6 (0x00007f89922bd000)
        libm.so.6 => /lib/libm.so.6 (0x00007f899203c000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f8991e2e000)
        libc.so.6 => /lib/libc.so.6 (0x00007f8991ae6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f899436f000)



------------------------------------------------------------------------------
Register Now & Save for Velocity, the Web Performance & Operations 
Conference from O'Reilly Media. Velocity features a full day of 
expert-led, hands-on workshops and two days of sessions from industry 
leaders in dedicated Performance & Operations tracks. Use code vel09scf 
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users