Bacula-users

[Bacula-users] Dell TL4000 labeling timeout

2015-06-16 16:05:21
Subject: [Bacula-users] Dell TL4000 labeling timeout
From: Andrew Noonan <anoonan AT gmail DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 16 Jun 2015 15:02:09 -0500
Hi all,

I'm almost completely new to tape.  We've been doing disk-based
backups for years, but we now have a project where we want to offsite
hundreds of TB permanently, and have a Dell TL4000 (a rebranded IBM
3573-TL from the looks of it) with 2 ULT3580 LTO-6 drives.  We're
running bacula 5.2.  The server is a Dell 1950 running Centos 5 (sorry
for the old OS).

The btape tests run on both units without a problem, including the
autochanger tests, and manually executing load/unload/list commands
with mtx-changer seem to run fine.  The one exception to this is that
the mtx-changer load command seems to take about 10 minutes to
complete, which seems unreasonably long.  These are brand new tapes
and I haven't written anything to them other then whatever btape does
with testing.  I put a 5 minute sleep on the load for mtx-changer, but
other then that haven't customized the script, as I'm not sure what
I'd customize.

The "update slots" command from the director works OK, but when I go
to do a "label barcode", the resulting "load slot" gets killed by
Bacula:

3992 Bad autochanger "load slot 20, drive 1": ERR=Child died from
signal 15: Termination.
Results=Program killed by Bacula (timeout)

I've seen that in some of these posts to the list, this ends up being
permissions problems against the devices, but that doesn't seem to be
the case as far as I can see:

bacula-sd is running as the bacula user/group.  The bacula user is in
the "disk" group, and the *st* devices are in the disk group with "rw"
permissions:

crw-rw---- 1 root disk 9, 128 Jun  4 12:02 /dev/nst0
crw-rw---- 1 root disk 9, 224 Jun  4 12:02 /dev/nst0a
crw-rw---- 1 root disk 9, 160 Jun  4 12:02 /dev/nst0l
crw-rw---- 1 root disk 9, 192 Jun  4 12:02 /dev/nst0m
crw-rw---- 1 root disk 9, 129 Jun 10 17:06 /dev/nst1
crw-rw---- 1 root disk 9, 225 Jun 10 17:06 /dev/nst1a
crw-rw---- 1 root disk 9, 161 Jun 10 17:06 /dev/nst1l
crw-rw---- 1 root disk 9, 193 Jun 10 17:06 /dev/nst1m
crw-rw---- 1 root disk 9,   0 Jun  4 12:02 /dev/st0
crw-rw---- 1 root disk 9,  96 Jun  4 12:02 /dev/st0a
crw-rw---- 1 root disk 9,  32 Jun  4 12:02 /dev/st0l
crw-rw---- 1 root disk 9,  64 Jun  4 12:02 /dev/st0m
crw-rw---- 1 root disk 9,   1 Jun 10 17:06 /dev/st1
crw-rw---- 1 root disk 9,  97 Jun 10 17:06 /dev/st1a
crw-rw---- 1 root disk 9,  33 Jun 10 17:06 /dev/st1l
crw-rw---- 1 root disk 9,  65 Jun 10 17:06 /dev/st1m

Here's a block of debug from the SD during a label attempt for one of the slots:

odin-sd: autochanger.c:434-0 Wiffle through devices looking for slot
odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger
odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer
/dev/changer loaded 14 /dev/nst0 0
odin-sd: autochanger.c:272-0 Run program=/usr/lib64/bacula/mtx-changer
/dev/changer loaded 14 /dev/nst0 0
odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300
odin-sd: bpipe.c:220-0 Wait for 28962 opt=1
odin-sd: bpipe.c:228-0 Got break wpid=28962 status=0 ERR=none
odin-sd: bpipe.c:249-0 child status=0
odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888
odin-sd: bpipe.c:264-0 returning stat=0,0
odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer
/dev/changer loaded 14 /dev/nst0 0 stat=0 result=0
odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger
odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger
odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer
/dev/changer loaded 14 /dev/nst1 1
odin-sd: autochanger.c:272-0 Run program=/usr/lib64/bacula/mtx-changer
/dev/changer loaded 14 /dev/nst1 1
odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300
odin-sd: bpipe.c:220-0 Wait for 28976 opt=1
odin-sd: bpipe.c:228-0 Got break wpid=28976 status=0 ERR=none
odin-sd: bpipe.c:249-0 child status=0
odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888
odin-sd: bpipe.c:264-0 returning stat=0,0
odin-sd: autochanger.c:274-0 run_prog: /usr/lib64/bacula/mtx-changer
/dev/changer loaded 14 /dev/nst1 1 stat=0 result=0
odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger
odin-sd: autochanger.c:453-0 Slot=14 not found in another device
odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger
odin-sd: autochanger.c:183-0 Doing changer load slot 14 "LRADrive-2" (/dev/nst1)
odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer
/dev/changer load 14 /dev/nst1 1
odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1)
odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed vol=
odin-sd: autochanger.c:190-0 Run program=/usr/lib64/bacula/mtx-changer
/dev/changer load 14 /dev/nst1 1
odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval 300
odin-sd: bpipe.c:443-0 Run program fgets killed=1
odin-sd: bpipe.c:220-0 Wait for 28990 opt=1
odin-sd: bpipe.c:228-0 Got break wpid=28990 status=15 ERR=none
odin-sd: bpipe.c:256-0 Child died from signal 15
odin-sd: watchdog.c:235-0 Unregistered inactive watchdog 636b888
odin-sd: bpipe.c:264-0 returning stat=15,134217743
odin-sd: autochanger.c:205-0 load slot 14, drive 1, bad stats=Child
died from signal 15: Termination.
odin-sd: autochanger.c:212-0 load slot 14 status=134217743
odin-sd: autochanger.c:327-0 Unlocking changer LogRepoAutochanger
odin-sd: autochanger.c:218-0 After changer, status=134217743
odin-sd: dev.c:1735-0 Clear volhdr vol=
odin-sd: vol_mgr.c:544-0 vol_unused: no vol on "LRADrive-2" (/dev/nst1)
odin-sd: lock.c:302-0 return lock. old=BST_WRITING_LABEL from dircmd.c:554
odin-sd: lock.c:307-0 return lock. new=BST_NOT_BLOCKED
odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1)
odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already closed vol=
odin-sd: acquire.c:731-0 Enter detach_dcr_from_dev
odin-sd: dircmd.c:220-0 <dird: label LogRepoAutochanger
VolumeName=000030L6 PoolName=LogrepoArchive MediaType=LTO-6 Slot=15
drive=1
odin-sd: dircmd.c:234-0 Do command: label
odin-sd: dircmd.c:627-0 Try changer device LRADrive-1
odin-sd: dircmd.c:648-0 Device LogRepoAutochanger drive wrong: want=1
got=0 skipping
odin-sd: dircmd.c:627-0 Try changer device LRADrive-2
odin-sd: dircmd.c:643-0 Found changer device LRADrive-2
odin-sd: dircmd.c:656-0 Found device LRADrive-2
odin-sd: block.c:144-0 Returning new block=636b800
odin-sd: acquire.c:713-0 JobId=0 enter attach_dcr_to_dev
odin-sd: dircmd.c:421-0 Can label. Device is not open
odin-sd: lock.c:285-0 steal lock. old=BST_NOT_BLOCKED from dircmd.c:470
odin-sd: lock.c:290-0 steal lock. new=BST_WRITING_LABEL
odin-sd: dircmd.c:471-0 Stole device "LRADrive-2" (/dev/nst1) lock,
writing label.

The config I've got for these is:

Device {
  Name = LRADrive-1
  Alert Command = "sh -c 'smartctl -H -l error %c'"
  AlwaysOpen = yes
  ArchiveDevice = /dev/nst0
  AutoChanger = yes
  AutomaticMount = yes
  DeviceType = Tape
  DriveIndex = 0
  LabelMedia = no
  MediaType = LTO-6
  RandomAccess = no
  RemovableMedia = yes
}

Device {
  Name = LRADrive-2
  Alert Command = "sh -c 'smartctl -H -l error %c'"
  AlwaysOpen = yes
  ArchiveDevice = /dev/nst1
  AutoChanger = yes
  AutomaticMount = yes
  DeviceType = Tape
  DriveIndex = 1
  LabelMedia = no
  MediaType = LTO-6
  RandomAccess = no
  RemovableMedia = yes
}

Autochanger {
  Name = LogRepoAutochanger
  ChangerCommand = "/usr/lib64/bacula/mtx-changer %c %o %S %a %d"
  ChangerDevice = /dev/changer
  Device = LRADrive-1
  Device = LRADrive-2
}


I know there are some things that could be optimized here for
performance, and I'm certainly interested in them, but right now I
can't even label my tapes :)

I suspect it's the long load delay, and I wasn't sure if maybe the
drive is searching for some mark or something.  On that note, I tried
to do a "rewind" and "weof" using the /dev/st0 device (wasn't sure if
nst0 would complain about issuing a rewind), but I would get
"Input/Output error" messages from mt on both the rewind and weof
commands.

Any advice I could get would be helpful.

Thanks!
Andrew

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users