Bacula-users

Re: [Bacula-users] Dell TL4000 labeling timeout

2015-06-19 15:02:13
Subject: Re: [Bacula-users] Dell TL4000 labeling timeout
From: Andrew Noonan <anoonan AT gmail DOT com>
To: "Bacula-users AT lists.sourceforge DOT net" <bacula-users AT lists.sourceforge DOT net>
Date: Fri, 19 Jun 2015 13:56:25 -0500
Hi Ana,

     It looks like that is it.  I flipped the devices that get pointed
to and I was able to attempt to label the disks.  They had old labels
on them from some previous attempt, though, so now I'm going through
and erasing the labels, but that process is also completing so far
without error.  I've got some additional questions on tuning and
sizing for this job, but I'll ask that in a different question.  I'll
probably give a final reply on this thread once the labeling is
complete, but all indications now are that it will be successful once
I finish erasing the tapes.

Thanks,
Andrew

On Fri, Jun 19, 2015 at 9:21 AM, Ana Emília M. Arruda
<emiliaarruda AT gmail DOT com> wrote:
> Hello Andrew,
>
> I'm affraid your /dev/nst1 is your first drive (índex 0) and /dev/nst0 is
> your second drive (índex 1). From your mt status output, you have /dev/nst1
> online with a tape loaded (i suspect this is the 44:000002L6 tape). Could
> you try changing this in your bacula-sd.conf (/dev/nst1 -> drive índex 0 and
> /dev/nst0 -> drive índex 1)?
>
> Also, for your I/O slots being recognized by mtx-changer script, you need to
> do a modification inside de section for the "list" command:
>
>       if test ${vxa_packetloader} -ne 0 ; then
>         cat ${TMPFILE} | grep " *Storage Element [0-9]*:.*Full" | sed "s/
> Storage Element //" | sed "s/Full :VolumeTag=//"
>       else
>         # cat ${TMPFILE} | grep " Storage Element [0-9]*:.*Full" | awk
> "{print \$3 \$4}" | sed "s/Full *\(:VolumeTag=\)*//"
>         cat ${TMPFILE} | sed "s/ IMPORT\/EXPORT//" | grep " Storage Element
> [0-9]*:.*Full" | awk "{print \$3 \$4}" | sed "s/Full *\(:VolumeTag=\)*//"
>       fi
>
> Best regards,
> Ana
>
> On Fri, Jun 19, 2015 at 12:34 AM, Andrew Noonan <anoonan AT gmail DOT com> 
> wrote:
>>
>> cat /proc/scsi/scsi:
>>
>> Attached devices:
>> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>>   Vendor: SEAGATE  Model: ST373455SS       Rev: S52C
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi0 Channel: 00 Id: 01 Lun: 00
>>   Vendor: SEAGATE  Model: ST373455SS       Rev: S52C
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi0 Channel: 01 Id: 00 Lun: 00
>>   Vendor: Dell     Model: VIRTUAL DISK     Rev: 1028
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 00
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 01
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 02
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 03
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 31
>>   Vendor: DELL     Model: Universal Xport  Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 00
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 01
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 02
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 03
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 31
>>   Vendor: DELL     Model: Universal Xport  Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 04
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 00 Lun: 05
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 04
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi1 Channel: 00 Id: 01 Lun: 05
>>   Vendor: DELL     Model: MD32xx           Rev: 0784
>>   Type:   Direct-Access                    ANSI SCSI revision: 05
>> Host: scsi2 Channel: 00 Id: 08 Lun: 00
>>   Vendor: IBM      Model: ULT3580-HH6      Rev: D8E5
>>   Type:   Sequential-Access                ANSI SCSI revision: 06
>> Host: scsi2 Channel: 00 Id: 10 Lun: 00
>>   Vendor: IBM      Model: ULT3580-HH6      Rev: D8E5
>>   Type:   Sequential-Access                ANSI SCSI revision: 06
>> Host: scsi2 Channel: 00 Id: 10 Lun: 01
>>   Vendor: IBM      Model: 3573-TL          Rev: C.30
>>   Type:   Medium Changer                   ANSI SCSI revision: 05
>>
>> sg_map -x
>>
>> /dev/sg0  0 0 0 0  0
>> /dev/sg1  0 0 1 0  0
>> /dev/sg2  0 1 0 0  0  /dev/sda
>> /dev/sg3  1 0 0 0  0  /dev/sdb
>> /dev/sg4  1 0 0 1  0  /dev/sdc
>> /dev/sg5  1 0 0 2  0  /dev/sdd
>> /dev/sg6  1 0 0 3  0  /dev/sde
>> /dev/sg7  1 0 0 31  0  /dev/sdf
>> /dev/sg8  1 0 1 0  0  /dev/sdg
>> /dev/sg9  1 0 1 1  0  /dev/sdh
>> /dev/sg10  1 0 1 2  0  /dev/sdi
>> /dev/sg11  1 0 1 3  0  /dev/sdj
>> /dev/sg12  1 0 1 31  0  /dev/sdk
>> /dev/sg13  1 0 0 4  0  /dev/sdl
>> /dev/sg14  1 0 0 5  0  /dev/sdm
>> /dev/sg15  1 0 1 4  0  /dev/sdn
>> /dev/sg16  1 0 1 5  0  /dev/sdo
>> /dev/sg17  2 0 8 0  1  /dev/nst0
>> /dev/sg18  2 0 10 0  1  /dev/nst1
>> /dev/sg19  2 0 10 1  8
>>
>> As an example of the status of things, here's the output of:
>>
>> mtx-changer /dev/changer listall 0 /dev/nst0 0
>>
>> D:0:F:44:000002L6
>> D:1:E
>> S:1:F:000045L6
>> S:2:F:000001L6
>> S:3:F:000046L6
>> S:4:F:000042L6
>> S:5:F:000043L6
>> S:6:F:000044L6
>> S:7:F:000038L6
>> S:8:F:000039L6
>> S:9:F:000040L6
>> S:10:F:000033L6
>> S:11:F:000034L6
>> S:12:F:000035L6
>> S:13:F:000036L6
>> S:14:F:000029L6
>> S:15:F:000030L6
>> S:16:F:000031L6
>> S:17:F:000032L6
>> S:18:F:000025L6
>> S:19:F:000026L6
>> S:20:F:000027L6
>> S:21:F:000028L6
>> S:22:F:000024L6
>> S:23:F:000023L6
>> S:24:F:000022L6
>> S:25:F:000021L6
>> S:26:F:000020L6
>> S:27:F:000019L6
>> S:28:F:000018L6
>> S:29:F:000017L6
>> S:30:F:000016L6
>> S:31:F:000015L6
>> S:32:F:000014L6
>> S:33:F:000013L6
>> S:34:F:000012L6
>> S:35:F:000011L6
>> S:36:F:000010L6
>> S:37:F:000009L6
>> S:38:F:000008L6
>> S:39:F:000007L6
>> S:40:F:000006L6
>> S:41:F:000005L6
>> S:42:F:000004L6
>> S:43:F:000003L6
>> S:44:E
>> I:45:F:000047L6
>> I:46:F:000041L6
>> I:47:F:000037L6
>>
>> so you can see that drive 0 has the tape from slot 44 in it, and drive
>> 1 is empty.  44 is the last slot that the "label barcode" seems to
>> know about... the "I" slots seem to be invisible to a normal "list"
>> command.  So that was a tape loaded yesterday when I attempted to do a
>> "label barcode" with the 900 sec timeouts.  I have not attempted to do
>> a load or unload for 24 hours.  Here are the current mt status results
>> from these drives:
>>
>> mt -f /dev/nst0 status
>> SCSI 2 tape drive:
>> File number=-1, block number=-1, partition=0.
>> Tape block size 0 bytes. Density code 0x0 (default).
>> Soft error count since last status=0
>> General status bits on (50000):
>>  DR_OPEN IM_REP_EN
>>
>>  mt -f /dev/nst1 status
>> SCSI 2 tape drive:
>> File number=0, block number=0, partition=0.
>> Tape block size 0 bytes. Density code 0x5a (no translation).
>> Soft error count since last status=0
>> General status bits on (41010000):
>>  BOT ONLINE IM_REP_EN
>>
>> So after having the tape in the drive for more then 24 hours, the
>> state is still DR_OPEN on drive 0, which since the $ready variable is
>> ONLINE, will prevent mtx-changer from ever returning before it times
>> out.
>>
>> On Thu, Jun 18, 2015 at 8:45 PM, Ana Emília M. Arruda
>> <emiliaarruda AT gmail DOT com> wrote:
>> > Hi Andrew,
>> >
>> > Also, can you send us "cat /proc/scsi/scis" and "sg_map -x"?
>> >
>> > Best regards,
>> > Ana
>> >
>> > On Thu, Jun 18, 2015 at 9:17 AM, Josh Fisher <jfisher AT pvct DOT com> 
>> > wrote:
>> >>
>> >>
>> >>
>> >> On 6/17/2015 10:35 PM, Andrew Noonan wrote:
>> >> > @Marcin - dmesg is clean
>> >> >
>> >> > @Ana - I ~think~ /dev/changer is created by udev, I'm not 100%.  It's
>> >> > a symlink to sg19 in this case:
>> >>
>> >> Most likely, udev got it right. Try mtx -f /dev/changer noattach
>> >> status.
>> >> If that works, then try the load with the 'noattach' prefix. This drive
>> >> could be reporting it supports the _ATTACHED API, but does not exactly
>> >> comply/match with the expected API commands. The noattach prefix will
>> >> force mtx to use the regular media changer API.
>> >>
>> >>
>> >>
>> >> >
>> >> > lrwxrwxrwx 1 root root 4 Jun 10 17:06 /dev/changer -> sg19
>> >> > crw-rw---- 1 root disk 21, 19 Jun 10 17:06 /dev/sg19
>> >> >
>> >> > Here are the tape drive parts of "lsscsi -l".  I removed a bunch of
>> >> > HDDs attached to the system:
>> >> >
>> >> > [2:0:8:0]    tape    IBM      ULT3580-HH6      D8E5  /dev/st0
>> >> >    state=running queue_depth=254 scsi_level=7 type=1 device_blocked=0
>> >> > timeout=900
>> >> > [2:0:10:0]   tape    IBM      ULT3580-HH6      D8E5  /dev/st1
>> >> >    state=running queue_depth=254 scsi_level=7 type=1 device_blocked=0
>> >> > timeout=900
>> >> > [2:0:10:1]   mediumx IBM      3573-TL          C.30  -
>> >> >    state=running queue_depth=254 scsi_level=6 type=8 device_blocked=0
>> >> > timeout=0
>> >> >
>> >> > At some point we had the Tape unit powered down and then powered it
>> >> > back up, but the system remained on, so the dmesg mentions the drives
>> >> > multiple times:
>> >> >
>> >> > [anoonan@odin ~]$ dmesg | grep Attached
>> >> > sd 0:1:0:0: Attached scsi disk sda
>> >> > scsi 0:0:0:0: Attached scsi generic sg0 type 0
>> >> > scsi 0:0:1:0: Attached scsi generic sg1 type 0
>> >> > sd 0:1:0:0: Attached scsi generic sg2 type 0
>> >> > sd 1:0:0:0: Attached scsi disk sdb
>> >> > sd 1:0:0:0: Attached scsi generic sg3 type 0
>> >> > sd 1:0:0:1: Attached scsi disk sdc
>> >> > sd 1:0:0:1: Attached scsi generic sg4 type 0
>> >> > sd 1:0:0:2: Attached scsi disk sdd
>> >> > sd 1:0:0:2: Attached scsi generic sg5 type 0
>> >> > sd 1:0:0:3: Attached scsi disk sde
>> >> > sd 1:0:0:3: Attached scsi generic sg6 type 0
>> >> > sd 1:0:0:31: Attached scsi disk sdf
>> >> > sd 1:0:0:31: Attached scsi generic sg7 type 0
>> >> > sd 1:0:1:0: Attached scsi disk sdg
>> >> > sd 1:0:1:0: Attached scsi generic sg8 type 0
>> >> > sd 1:0:1:1: Attached scsi disk sdh
>> >> > sd 1:0:1:1: Attached scsi generic sg9 type 0
>> >> > sd 1:0:1:2: Attached scsi disk sdi
>> >> > sd 1:0:1:2: Attached scsi generic sg10 type 0
>> >> > sd 1:0:1:3: Attached scsi disk sdj
>> >> > sd 1:0:1:3: Attached scsi generic sg11 type 0
>> >> > sd 1:0:1:31: Attached scsi disk sdk
>> >> > sd 1:0:1:31: Attached scsi generic sg12 type 0
>> >> > scsi 2:0:0:0: Attached scsi generic sg13 type 1
>> >> > scsi 2:0:1:0: Attached scsi generic sg14 type 1
>> >> > st 2:0:0:0: Attached scsi tape st0
>> >> > st 2:0:1:0: Attached scsi tape st1
>> >> > st 2:0:2:0: Attached scsi tape st1
>> >> > st 2:0:2:0: Attached scsi generic sg14 type 1
>> >> > st 2:0:3:0: Attached scsi tape st1
>> >> > st 2:0:3:0: Attached scsi generic sg14 type 1
>> >> > st 2:0:4:0: Attached scsi tape st0
>> >> > st 2:0:4:0: Attached scsi generic sg13 type 1
>> >> > st 2:0:5:0: Attached scsi tape st1
>> >> > st 2:0:5:0: Attached scsi generic sg14 type 1
>> >> > scsi 2:0:5:1: Attached scsi generic sg15 type 8
>> >> > st 2:0:6:0: Attached scsi tape st0
>> >> > st 2:0:6:0: Attached scsi generic sg13 type 1
>> >> > scsi 2:0:6:1: Attached scsi generic sg14 type 8
>> >> > st 2:0:7:0: Attached scsi tape st1
>> >> > st 2:0:7:0: Attached scsi generic sg15 type 1
>> >> > sd 1:0:0:4: Attached scsi disk sdl
>> >> > sd 1:0:0:4: Attached scsi generic sg13 type 0
>> >> > sd 1:0:0:5: Attached scsi disk sdm
>> >> > sd 1:0:0:5: Attached scsi generic sg14 type 0
>> >> > sd 1:0:1:4: Attached scsi disk sdn
>> >> > sd 1:0:1:4: Attached scsi generic sg15 type 0
>> >> > sd 1:0:1:5: Attached scsi disk sdo
>> >> > sd 1:0:1:5: Attached scsi generic sg16 type 0
>> >> > st 2:0:8:0: Attached scsi tape st0
>> >> > st 2:0:8:0: Attached scsi generic sg17 type 1
>> >> > st 2:0:9:0: Attached scsi tape st1
>> >> > st 2:0:9:0: Attached scsi generic sg18 type 1
>> >> > scsi 2:0:9:1: Attached scsi generic sg19 type 8
>> >> > st 2:0:10:0: Attached scsi tape st1
>> >> > st 2:0:10:0: Attached scsi generic sg18 type 1
>> >> > scsi 2:0:10:1: Attached scsi generic sg19 type 8
>> >> >
>> >> > I did change out the device names for the drives to be
>> >> > /dev/tape/by-id
>> >> > names instead to make sure the naming stays stable after reboots, but
>> >> > I haven't tried changing /dev/changer to anything else.  The btape
>> >> > tests were successful, and I haven't had any problems with mtx or
>> >> > even
>> >> > mt commands, though as mentioned previously, I've gotten Input/Output
>> >> > errors from mt when doing rewind/weof commands to the drives.  That
>> >> > being said, I'm suspicious of the amount of time that mt reports
>> >> > DR_OPEN on loads.  I can issue mtx-changer commands OK, as well,
>> >> > though I'm not sure if the "load" command is actually returning
>> >> > correctly, or is just timing out internally.  unloads/list/listall
>> >> > from mtx-changer have always returned successfully.
>> >> >
>> >> > This is a Centos 5 machine, so not very new at all.  mtx package is:
>> >> > mtx-1.2.18-9, mt-st is mt-st-0.9b-4.el5
>> >> >
>> >> > It may be a little challenge to upgrade this system (probably to
>> >> > Centos 6), but not impossible if it needs to happen.
>> >> >
>> >> > Thanks,
>> >> > Andrew
>> >> >
>> >> > On Wed, Jun 17, 2015 at 8:32 PM, Ana Emília M. Arruda
>> >> > <emiliaarruda AT gmail DOT com> wrote:
>> >> >> Hello Andrew,
>> >> >>
>> >> >> Is /dev/changer created by udev rules? Have you tried /dev/sgX
>> >> >> instead?
>> >> >> Can
>> >> >> you send us the output of the "lsscsi -l" command and "dmesg | grep
>> >> >> Attached"? Have you checked your drives/autochanger using just
>> >> >> mtx/mt
>> >> >> commands to see if they are working? Which is your mtx version?
>> >> >>
>> >> >> Best regards,
>> >> >> Ana
>> >> >>
>> >> >> On Wed, Jun 17, 2015 at 7:29 PM, Marcin Haba <ganiuszka AT gmail DOT 
>> >> >> com>
>> >> >> wrote:
>> >> >>> Hello,
>> >> >>>
>> >> >>> Do you have any errors in dmesg (hardware errors, bus reset, SCSI
>> >> >>> errors ... etc.) ?
>> >> >>>
>> >> >>> Best regards,
>> >> >>> Marcin Haba (gani)
>> >> >>>
>> >> >>> 2015-06-17 21:56 GMT+02:00 Andrew Noonan <anoonan AT gmail DOT com>:
>> >> >>>> Hi all,
>> >> >>>>
>> >> >>>>       It's taking a lot longer because of the higher timeouts, but
>> >> >>>> the
>> >> >>>> label is still failing with a termination.  If I understand it
>> >> >>>> correctly, the mtx-changer script is polling with 'mt' looking for
>> >> >>>> the
>> >> >>>> $ready state, defined in the config file as ONLINE (for Linux).
>> >> >>>> I'm
>> >> >>>> not seeing drive 0 go into that state... I just see:
>> >> >>>>
>> >> >>>> SCSI 2 tape drive:
>> >> >>>> File number=-1, block number=-1, partition=0.
>> >> >>>> Tape block size 0 bytes. Density code 0x0 (default).
>> >> >>>> Soft error count since last status=0
>> >> >>>> General status bits on (50000):
>> >> >>>>   DR_OPEN IM_REP_EN
>> >> >>>>
>> >> >>>> the other device looks like:
>> >> >>>>
>> >> >>>> SCSI 2 tape drive:
>> >> >>>> File number=0, block number=0, partition=0.
>> >> >>>> Tape block size 0 bytes. Density code 0x5a (no translation).
>> >> >>>> Soft error count since last status=0
>> >> >>>> General status bits on (41010000):
>> >> >>>>   BOT ONLINE IM_REP_EN
>> >> >>>>
>> >> >>>> So I see that it's ~possible~ to see the ONLINE state, but it
>> >> >>>> doesn't
>> >> >>>> seem like it ever gets to that state during load.
>> >> >>>>
>> >> >>>> Any thoughts?
>> >> >>>>
>> >> >>>> Thanks,
>> >> >>>> Andrew
>> >> >>>>
>> >> >>>> On Wed, Jun 17, 2015 at 11:44 AM, Andrew Noonan
>> >> >>>> <anoonan AT gmail DOT com>
>> >> >>>> wrote:
>> >> >>>>> Hi Ana,
>> >> >>>>>
>> >> >>>>>       Thanks for the reply.  I'm adding those into the drives.
>> >> >>>>> BTW,
>> >> >>>>> 900 is the value.  Having no real experience with these, is it
>> >> >>>>> abnormal for a load to take the 10+ minutes, or is that
>> >> >>>>> reasonable?
>> >> >>>>> My next step is to add those settings in, restart the SD, and
>> >> >>>>> attempt
>> >> >>>>> to do a "label barcode" again.
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Andrew
>> >> >>>>>
>> >> >>>>> On Tue, Jun 16, 2015 at 9:10 PM, Ana Emília M. Arruda
>> >> >>>>> <emiliaarruda AT gmail DOT com> wrote:
>> >> >>>>>> Hello Andrew,
>> >> >>>>>>
>> >> >>>>>> You can find in the output of a "lsscsi -l" command the timeout
>> >> >>>>>> for
>> >> >>>>>> your
>> >> >>>>>> drives. Then you can configure 3 timeout directives for each one
>> >> >>>>>> of
>> >> >>>>>> your two
>> >> >>>>>> drives (LRADrive-1 e LRADrive-2):
>> >> >>>>>>
>> >> >>>>>> Maximum Changer Wait = X
>> >> >>>>>> Maximum Rewind Wait = X
>> >> >>>>>> Maximum Open Wait = X
>> >> >>>>>>
>> >> >>>>>> where X is the timeout value for your dirves.
>> >> >>>>>>
>> >> >>>>>> You can also customize your mtx-changer script for this timeout
>> >> >>>>>> changing the
>> >> >>>>>> bellow 300 seconds value:
>> >> >>>>>>
>> >> >>>>>> wait_for_drive() {
>> >> >>>>>>    i=0
>> >> >>>>>>    while [ $i -le 300 ]; do  # Wait max 300 seconds
>> >> >>>>>>
>> >> >>>>>> Best regards,
>> >> >>>>>> Ana
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Tue, Jun 16, 2015 at 5:02 PM, Andrew Noonan
>> >> >>>>>> <anoonan AT gmail DOT com>
>> >> >>>>>> wrote:
>> >> >>>>>>> Hi all,
>> >> >>>>>>>
>> >> >>>>>>> I'm almost completely new to tape.  We've been doing disk-based
>> >> >>>>>>> backups for years, but we now have a project where we want to
>> >> >>>>>>> offsite
>> >> >>>>>>> hundreds of TB permanently, and have a Dell TL4000 (a rebranded
>> >> >>>>>>> IBM
>> >> >>>>>>> 3573-TL from the looks of it) with 2 ULT3580 LTO-6 drives.
>> >> >>>>>>> We're
>> >> >>>>>>> running bacula 5.2.  The server is a Dell 1950 running Centos 5
>> >> >>>>>>> (sorry
>> >> >>>>>>> for the old OS).
>> >> >>>>>>>
>> >> >>>>>>> The btape tests run on both units without a problem, including
>> >> >>>>>>> the
>> >> >>>>>>> autochanger tests, and manually executing load/unload/list
>> >> >>>>>>> commands
>> >> >>>>>>> with mtx-changer seem to run fine.  The one exception to this
>> >> >>>>>>> is
>> >> >>>>>>> that
>> >> >>>>>>> the mtx-changer load command seems to take about 10 minutes to
>> >> >>>>>>> complete, which seems unreasonably long.  These are brand new
>> >> >>>>>>> tapes
>> >> >>>>>>> and I haven't written anything to them other then whatever
>> >> >>>>>>> btape
>> >> >>>>>>> does
>> >> >>>>>>> with testing.  I put a 5 minute sleep on the load for
>> >> >>>>>>> mtx-changer,
>> >> >>>>>>> but
>> >> >>>>>>> other then that haven't customized the script, as I'm not sure
>> >> >>>>>>> what
>> >> >>>>>>> I'd customize.
>> >> >>>>>>>
>> >> >>>>>>> The "update slots" command from the director works OK, but when
>> >> >>>>>>> I
>> >> >>>>>>> go
>> >> >>>>>>> to do a "label barcode", the resulting "load slot" gets killed
>> >> >>>>>>> by
>> >> >>>>>>> Bacula:
>> >> >>>>>>>
>> >> >>>>>>> 3992 Bad autochanger "load slot 20, drive 1": ERR=Child died
>> >> >>>>>>> from
>> >> >>>>>>> signal 15: Termination.
>> >> >>>>>>> Results=Program killed by Bacula (timeout)
>> >> >>>>>>>
>> >> >>>>>>> I've seen that in some of these posts to the list, this ends up
>> >> >>>>>>> being
>> >> >>>>>>> permissions problems against the devices, but that doesn't seem
>> >> >>>>>>> to
>> >> >>>>>>> be
>> >> >>>>>>> the case as far as I can see:
>> >> >>>>>>>
>> >> >>>>>>> bacula-sd is running as the bacula user/group.  The bacula user
>> >> >>>>>>> is
>> >> >>>>>>> in
>> >> >>>>>>> the "disk" group, and the *st* devices are in the disk group
>> >> >>>>>>> with
>> >> >>>>>>> "rw"
>> >> >>>>>>> permissions:
>> >> >>>>>>>
>> >> >>>>>>> crw-rw---- 1 root disk 9, 128 Jun  4 12:02 /dev/nst0
>> >> >>>>>>> crw-rw---- 1 root disk 9, 224 Jun  4 12:02 /dev/nst0a
>> >> >>>>>>> crw-rw---- 1 root disk 9, 160 Jun  4 12:02 /dev/nst0l
>> >> >>>>>>> crw-rw---- 1 root disk 9, 192 Jun  4 12:02 /dev/nst0m
>> >> >>>>>>> crw-rw---- 1 root disk 9, 129 Jun 10 17:06 /dev/nst1
>> >> >>>>>>> crw-rw---- 1 root disk 9, 225 Jun 10 17:06 /dev/nst1a
>> >> >>>>>>> crw-rw---- 1 root disk 9, 161 Jun 10 17:06 /dev/nst1l
>> >> >>>>>>> crw-rw---- 1 root disk 9, 193 Jun 10 17:06 /dev/nst1m
>> >> >>>>>>> crw-rw---- 1 root disk 9,   0 Jun  4 12:02 /dev/st0
>> >> >>>>>>> crw-rw---- 1 root disk 9,  96 Jun  4 12:02 /dev/st0a
>> >> >>>>>>> crw-rw---- 1 root disk 9,  32 Jun  4 12:02 /dev/st0l
>> >> >>>>>>> crw-rw---- 1 root disk 9,  64 Jun  4 12:02 /dev/st0m
>> >> >>>>>>> crw-rw---- 1 root disk 9,   1 Jun 10 17:06 /dev/st1
>> >> >>>>>>> crw-rw---- 1 root disk 9,  97 Jun 10 17:06 /dev/st1a
>> >> >>>>>>> crw-rw---- 1 root disk 9,  33 Jun 10 17:06 /dev/st1l
>> >> >>>>>>> crw-rw---- 1 root disk 9,  65 Jun 10 17:06 /dev/st1m
>> >> >>>>>>>
>> >> >>>>>>> Here's a block of debug from the SD during a label attempt for
>> >> >>>>>>> one
>> >> >>>>>>> of
>> >> >>>>>>> the
>> >> >>>>>>> slots:
>> >> >>>>>>>
>> >> >>>>>>> odin-sd: autochanger.c:434-0 Wiffle through devices looking for
>> >> >>>>>>> slot
>> >> >>>>>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger
>> >> >>>>>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer loaded 14 /dev/nst0 0
>> >> >>>>>>> odin-sd: autochanger.c:272-0 Run
>> >> >>>>>>> program=/usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer loaded 14 /dev/nst0 0
>> >> >>>>>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval
>> >> >>>>>>> 300
>> >> >>>>>>> odin-sd: bpipe.c:220-0 Wait for 28962 opt=1
>> >> >>>>>>> odin-sd: bpipe.c:228-0 Got break wpid=28962 status=0 ERR=none
>> >> >>>>>>> odin-sd: bpipe.c:249-0 child status=0
>> >> >>>>>>> odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888
>> >> >>>>>>> odin-sd: bpipe.c:264-0 returning stat=0,0
>> >> >>>>>>> odin-sd: autochanger.c:274-0 run_prog:
>> >> >>>>>>> /usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer loaded 14 /dev/nst0 0 stat=0 result=0
>> >> >>>>>>> odin-sd: autochanger.c:327-0 Unlocking changer
>> >> >>>>>>> LogRepoAutochanger
>> >> >>>>>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger
>> >> >>>>>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer loaded 14 /dev/nst1 1
>> >> >>>>>>> odin-sd: autochanger.c:272-0 Run
>> >> >>>>>>> program=/usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer loaded 14 /dev/nst1 1
>> >> >>>>>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval
>> >> >>>>>>> 300
>> >> >>>>>>> odin-sd: bpipe.c:220-0 Wait for 28976 opt=1
>> >> >>>>>>> odin-sd: bpipe.c:228-0 Got break wpid=28976 status=0 ERR=none
>> >> >>>>>>> odin-sd: bpipe.c:249-0 child status=0
>> >> >>>>>>> odin-sd: watchdog.c:226-0 Unregistered watchdog 636b888
>> >> >>>>>>> odin-sd: bpipe.c:264-0 returning stat=0,0
>> >> >>>>>>> odin-sd: autochanger.c:274-0 run_prog:
>> >> >>>>>>> /usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer loaded 14 /dev/nst1 1 stat=0 result=0
>> >> >>>>>>> odin-sd: autochanger.c:327-0 Unlocking changer
>> >> >>>>>>> LogRepoAutochanger
>> >> >>>>>>> odin-sd: autochanger.c:453-0 Slot=14 not found in another
>> >> >>>>>>> device
>> >> >>>>>>> odin-sd: autochanger.c:313-0 Locking changer LogRepoAutochanger
>> >> >>>>>>> odin-sd: autochanger.c:183-0 Doing changer load slot 14
>> >> >>>>>>> "LRADrive-2"
>> >> >>>>>>> (/dev/nst1)
>> >> >>>>>>> odin-sd: autochanger.c:740-0 omsg=/usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer load 14 /dev/nst1 1
>> >> >>>>>>> odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1)
>> >> >>>>>>> odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already
>> >> >>>>>>> closed
>> >> >>>>>>> vol=
>> >> >>>>>>> odin-sd: autochanger.c:190-0 Run
>> >> >>>>>>> program=/usr/lib64/bacula/mtx-changer
>> >> >>>>>>> /dev/changer load 14 /dev/nst1 1
>> >> >>>>>>> odin-sd: watchdog.c:206-0 Registered watchdog 636b888, interval
>> >> >>>>>>> 300
>> >> >>>>>>> odin-sd: bpipe.c:443-0 Run program fgets killed=1
>> >> >>>>>>> odin-sd: bpipe.c:220-0 Wait for 28990 opt=1
>> >> >>>>>>> odin-sd: bpipe.c:228-0 Got break wpid=28990 status=15 ERR=none
>> >> >>>>>>> odin-sd: bpipe.c:256-0 Child died from signal 15
>> >> >>>>>>> odin-sd: watchdog.c:235-0 Unregistered inactive watchdog
>> >> >>>>>>> 636b888
>> >> >>>>>>> odin-sd: bpipe.c:264-0 returning stat=15,134217743
>> >> >>>>>>> odin-sd: autochanger.c:205-0 load slot 14, drive 1, bad
>> >> >>>>>>> stats=Child
>> >> >>>>>>> died from signal 15: Termination.
>> >> >>>>>>> odin-sd: autochanger.c:212-0 load slot 14 status=134217743
>> >> >>>>>>> odin-sd: autochanger.c:327-0 Unlocking changer
>> >> >>>>>>> LogRepoAutochanger
>> >> >>>>>>> odin-sd: autochanger.c:218-0 After changer, status=134217743
>> >> >>>>>>> odin-sd: dev.c:1735-0 Clear volhdr vol=
>> >> >>>>>>> odin-sd: vol_mgr.c:544-0 vol_unused: no vol on "LRADrive-2"
>> >> >>>>>>> (/dev/nst1)
>> >> >>>>>>> odin-sd: lock.c:302-0 return lock. old=BST_WRITING_LABEL from
>> >> >>>>>>> dircmd.c:554
>> >> >>>>>>> odin-sd: lock.c:307-0 return lock. new=BST_NOT_BLOCKED
>> >> >>>>>>> odin-sd: dev.c:1746-0 close_dev "LRADrive-2" (/dev/nst1)
>> >> >>>>>>> odin-sd: dev.c:1751-0 device "LRADrive-2" (/dev/nst1) already
>> >> >>>>>>> closed
>> >> >>>>>>> vol=
>> >> >>>>>>> odin-sd: acquire.c:731-0 Enter detach_dcr_from_dev
>> >> >>>>>>> odin-sd: dircmd.c:220-0 <dird: label LogRepoAutochanger
>> >> >>>>>>> VolumeName=000030L6 PoolName=LogrepoArchive MediaType=LTO-6
>> >> >>>>>>> Slot=15
>> >> >>>>>>> drive=1
>> >> >>>>>>> odin-sd: dircmd.c:234-0 Do command: label
>> >> >>>>>>> odin-sd: dircmd.c:627-0 Try changer device LRADrive-1
>> >> >>>>>>> odin-sd: dircmd.c:648-0 Device LogRepoAutochanger drive wrong:
>> >> >>>>>>> want=1
>> >> >>>>>>> got=0 skipping
>> >> >>>>>>> odin-sd: dircmd.c:627-0 Try changer device LRADrive-2
>> >> >>>>>>> odin-sd: dircmd.c:643-0 Found changer device LRADrive-2
>> >> >>>>>>> odin-sd: dircmd.c:656-0 Found device LRADrive-2
>> >> >>>>>>> odin-sd: block.c:144-0 Returning new block=636b800
>> >> >>>>>>> odin-sd: acquire.c:713-0 JobId=0 enter attach_dcr_to_dev
>> >> >>>>>>> odin-sd: dircmd.c:421-0 Can label. Device is not open
>> >> >>>>>>> odin-sd: lock.c:285-0 steal lock. old=BST_NOT_BLOCKED from
>> >> >>>>>>> dircmd.c:470
>> >> >>>>>>> odin-sd: lock.c:290-0 steal lock. new=BST_WRITING_LABEL
>> >> >>>>>>> odin-sd: dircmd.c:471-0 Stole device "LRADrive-2" (/dev/nst1)
>> >> >>>>>>> lock,
>> >> >>>>>>> writing label.
>> >> >>>>>>>
>> >> >>>>>>> The config I've got for these is:
>> >> >>>>>>>
>> >> >>>>>>> Device {
>> >> >>>>>>>    Name = LRADrive-1
>> >> >>>>>>>    Alert Command = "sh -c 'smartctl -H -l error %c'"
>> >> >>>>>>>    AlwaysOpen = yes
>> >> >>>>>>>    ArchiveDevice = /dev/nst0
>> >> >>>>>>>    AutoChanger = yes
>> >> >>>>>>>    AutomaticMount = yes
>> >> >>>>>>>    DeviceType = Tape
>> >> >>>>>>>    DriveIndex = 0
>> >> >>>>>>>    LabelMedia = no
>> >> >>>>>>>    MediaType = LTO-6
>> >> >>>>>>>    RandomAccess = no
>> >> >>>>>>>    RemovableMedia = yes
>> >> >>>>>>> }
>> >> >>>>>>>
>> >> >>>>>>> Device {
>> >> >>>>>>>    Name = LRADrive-2
>> >> >>>>>>>    Alert Command = "sh -c 'smartctl -H -l error %c'"
>> >> >>>>>>>    AlwaysOpen = yes
>> >> >>>>>>>    ArchiveDevice = /dev/nst1
>> >> >>>>>>>    AutoChanger = yes
>> >> >>>>>>>    AutomaticMount = yes
>> >> >>>>>>>    DeviceType = Tape
>> >> >>>>>>>    DriveIndex = 1
>> >> >>>>>>>    LabelMedia = no
>> >> >>>>>>>    MediaType = LTO-6
>> >> >>>>>>>    RandomAccess = no
>> >> >>>>>>>    RemovableMedia = yes
>> >> >>>>>>> }
>> >> >>>>>>>
>> >> >>>>>>> Autochanger {
>> >> >>>>>>>    Name = LogRepoAutochanger
>> >> >>>>>>>    ChangerCommand = "/usr/lib64/bacula/mtx-changer %c %o %S %a
>> >> >>>>>>> %d"
>> >> >>>>>>>    ChangerDevice = /dev/changer
>> >> >>>>>>>    Device = LRADrive-1
>> >> >>>>>>>    Device = LRADrive-2
>> >> >>>>>>> }
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> I know there are some things that could be optimized here for
>> >> >>>>>>> performance, and I'm certainly interested in them, but right
>> >> >>>>>>> now I
>> >> >>>>>>> can't even label my tapes :)
>> >> >>>>>>>
>> >> >>>>>>> I suspect it's the long load delay, and I wasn't sure if maybe
>> >> >>>>>>> the
>> >> >>>>>>> drive is searching for some mark or something.  On that note, I
>> >> >>>>>>> tried
>> >> >>>>>>> to do a "rewind" and "weof" using the /dev/st0 device (wasn't
>> >> >>>>>>> sure
>> >> >>>>>>> if
>> >> >>>>>>> nst0 would complain about issuing a rewind), but I would get
>> >> >>>>>>> "Input/Output error" messages from mt on both the rewind and
>> >> >>>>>>> weof
>> >> >>>>>>> commands.
>> >> >>>>>>>
>> >> >>>>>>> Any advice I could get would be helpful.
>> >> >>>>>>>
>> >> >>>>>>> Thanks!
>> >> >>>>>>> Andrew
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> ------------------------------------------------------------------------------
>> >> >>>>>>> _______________________________________________
>> >> >>>>>>> Bacula-users mailing list
>> >> >>>>>>> Bacula-users AT lists.sourceforge DOT net
>> >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>> >> >>>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> ------------------------------------------------------------------------------
>> >> >>>> _______________________________________________
>> >> >>>> Bacula-users mailing list
>> >> >>>> Bacula-users AT lists.sourceforge DOT net
>> >> >>>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> "Większej miłości nikt nie ma nad tę, jak gdy kto życie swoje
>> >> >>> kładzie
>> >> >>> za przyjaciół swoich." Jezus Chrystus
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> ------------------------------------------------------------------------------
>> >> >>> _______________________________________________
>> >> >>> Bacula-users mailing list
>> >> >>> Bacula-users AT lists.sourceforge DOT net
>> >> >>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>> >> >>
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > _______________________________________________
>> >> > Bacula-users mailing list
>> >> > Bacula-users AT lists.sourceforge DOT net
>> >> > https://lists.sourceforge.net/lists/listinfo/bacula-users
>> >>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> _______________________________________________
>> >> Bacula-users mailing list
>> >> Bacula-users AT lists.sourceforge DOT net
>> >> https://lists.sourceforge.net/lists/listinfo/bacula-users
>> >
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Bacula-users mailing list
>> > Bacula-users AT lists.sourceforge DOT net
>> > https://lists.sourceforge.net/lists/listinfo/bacula-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users AT lists.sourceforge DOT net
>> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>

------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users