Bacula-users

Re: [Bacula-users] Error talking to remote storage daemon

2017-07-27 13:42:30
Subject: Re: [Bacula-users] Error talking to remote storage daemon
From: Steve Garcia <sgarcia AT bak.rr DOT com>
To: Ana Emília M. Arruda <emiliaarruda AT gmail DOT com>
Date: Thu, 27 Jul 2017 17:41:04 +0000
Steve Garcia
Ignorance killed the cat, curiosity was framed.

---- "Ana Emília M. Arruda" <emiliaarruda AT gmail DOT com> wrote: 
> Hi steve,
> 
> It seems we have a configuration issue. It is not a good idea to have
> symlinks to sg devices. They can change after a server reboot.
> 
> **************
> From the /dev directory on odin:
> lrwxrwxrwx 1 root root 3     Jun  5 17:42 /dev/autochanger1 -> sg3
> crw-rw---- 1 root tape 21, 3 Jun  1 15:01 /dev/sg3
> **************
> 
> This can be very probably causing this issue with the label command. But I
> can't confirm this because I have never tried this.
> 
> It is better to use the /dev/tape/by-id names and even better to create
> udev rules based on the tape library specific charactheristics  such as
> serial number.

Well, that symlink *is* generated by a udev rule:

SUBSYSTEM=="scsi_generic",ATTRS{vendor}=="IBM",ATTRS{model}=="3573-TL",SYMLINK+="autochanger1",GROUP="tape",MODE="0660"

I suppose I could put in a serial number to be more specific, but that would 
probably only matter if I were to have more than one changer.  

I *am* showing both the changer and the tape drive itself twice, but there is 
only one of each.  I notice that by-id they only show up once, but by path they 
come through twice:

root@odin:/etc/bacula# lsscsi -g
[0:2:0:0]    disk    DELL     PERC H730 Mini   4.27  /dev/sda   /dev/sg0 
[1:0:0:0]    tape    IBM      ULT3580-HH6      G9P1  /dev/st0   /dev/sg2 
[1:0:0:1]    mediumx IBM      3573-TL          E.30  /dev/sch0  /dev/sg3 
[1:0:1:0]    tape    IBM      ULT3580-HH6      G9P1  /dev/st1   /dev/sg4 
[1:0:1:1]    mediumx IBM      3573-TL          E.30  /dev/sch1  /dev/sg5 
[6:2:0:0]    disk    DELL     PERC H830 Adp    4.27  /dev/sdb   /dev/sg1 
[12:0:0:0]   cd/dvd  PLDS     DVD-ROM DS-8DBSH MD52  /dev/sr0   /dev/sg6 

root@odin:/etc/bacula# ls -lR /dev/tape
/dev/tape:
total 0
drwxr-xr-x 2 root root 100 Jun  1 15:01 by-id
drwxr-xr-x 2 root root 120 Jun  1 15:01 by-path

/dev/tape/by-id:
total 0
lrwxrwxrwx 1 root root  9 Jun  1 15:01 scsi-1IBM_3573-TL_00X2U78BZ022_LL0 -> 
../../sg3
lrwxrwxrwx 1 root root  9 Jun  1 15:01 scsi-35000e11164c42001 -> ../../st0
lrwxrwxrwx 1 root root 10 Jun  1 15:01 scsi-35000e11164c42001-nst -> ../../nst0

/dev/tape/by-path:
total 0
lrwxrwxrwx 1 root root  9 Jun  1 15:01 pci-0000:05:00.0-sas-phy2-lun-0 -> 
../../st0
lrwxrwxrwx 1 root root 10 Jun  1 15:01 pci-0000:05:00.0-sas-phy2-lun-0-nst -> 
../../nst0
lrwxrwxrwx 1 root root  9 Jun  1 15:01 pci-0000:05:00.0-sas-phy6-lun-0 -> 
../../st1
lrwxrwxrwx 1 root root 10 Jun  1 15:01 pci-0000:05:00.0-sas-phy6-lun-0-nst -> 
../../nst1


> 
> A "ls -lR /dev/tape" and "lsscsi -g" will help you.
> 
> Then you should chage:
> 
> Changer Device = /dev/autochanger1 with the by-id name for mediumx
> 
> And
> 
> Archive Device = /dev/nst0 with the by-id name for the tape drive (remember
> to use here the one that ends with -nst).

OK, I've changed both to their by-id equivalents, but I get exactly the same 
error.  I can't say that's a big surprise, since the by-id mediumx points to 
/dev/sg3, just as /dev/autochanger1 did, and the by-id tape drive points to 
/dev/nst0.

I have a feeling this is in some way related to the fact that the director is 
connecting to the storage daemon remotely, but I can't see how that would make 
a difference.  It *is* able to retrieve an accurate listing of the cartridges 
in the library, it's only after I say "yes, start labeling" that it decides it 
can't connect to AutochangerOdin.  Note that it's not complaining about 
connecting to the physical device, but to the logical Autochanger device as 
defined in bacula-sd.conf.  Also note that the 3999 failures happen instantly 
-- there's no delay as if it were trying to connect to a hardware device and 
failing.

But wouldn't the listing have to come from a successful connection to 
AutochangerOdin?

Any other thoughts?  

> 
> I noticed you have data spool configured. With an LTO6 It is very probably
> that you will slow down backups performance, unless you have clients with
> poor network performance.

I hadn't thought of that.  I'm used to much slower drives.  :-)  I'll test it, 
but you're probably right, so the spool directory will probably go away.

> 
> Hope this helps you.
> 
> Best regards,
> Ana
> 
> El 26 jul. 2017 20:25, "Steve Garcia" <sgarcia AT bak.rr DOT com> escribió:
> 
> OK, I've got my tape drive working (thanks Ana!) but I'm having trouble
> connecting to the autochanger it's in using the director.  This is the
> first time I've tried having a storage daemon on a different machine than
> the director.  The director is a slightly lower version (7.4.3 on Debian
> Jessie using backports) than the storage daemon (7.4.4 on stretch) but I
> had understood that those versions were close enough to work.
> 
> So I'm hoping this is another configuration issue.
> 
> Right now what I'm trying to do is label all the tapes in the new library.
> 
> When I try to access the new storage from the director, it is able to get a
> listing of all the tapes, but it fails when it tries to actually do the
> labeling.  I get a "3999 Device not found or could not be opened" error.
> These errors show up quickly, there is no delay as it tries each slot, so
> it's obviously not getting far enough to try.  But it *is* obviously
> connecting to the remote storage, otherwise it wouldn't be able to obtain
> the slot list.
> 
> What am I missing?
> 
> root@sleipnir:/etc/bacula# bconsole
> Connecting to Director sleipnir:9101
> 1000 OK: 102 sleipnir-dir Version: 7.4.3 (18 June 2016)
> Enter a period to cancel a command.
> *label storage=Library2 barcodes
> Automatically selected Catalog: MyCatalog
> Using Catalog "MyCatalog"
> Connecting to Storage daemon Library2 at odin:9103 ...
> 3306 Issuing autochanger "slots" command.
> Device "AutochangerOdin" has 24 slots.
> Connecting to Storage daemon Library2 at odin:9103 ...
> 3306 Issuing autochanger "list" command.
> The following Volumes will be labeled:
> Slot  Volume
> ==============
>    1  000015L6
>    2  000018L6
>    3  000021L6
>    4  CLNU00L1
>    5  000014L6
>    6  000017L6
>    7  000020L6
>    8  CLN005L3
>    9  000013L6
>   10  000016L6
>   11  000019L6
>   12  000012L6
>   13  000009L6
>   14  000006L6
>   15  000003L6
>   16  000011L6
>   17  000008L6
>   18  000005L6
>   19  000002L6
>   20  000010L6
>   21  000007L6
>   22  000004L6
>   23  000001L6
> Do you want to label these Volumes? (yes|no):  yes
> Defined Pools:
>      1: Default
>      2: OdinPool
> Select the Pool (1-2): 2
> Connecting to Storage daemon Library2 at odin:9103 ...
> Sending label command for Volume "000015L6" Slot 1 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000015L6.
> Sending label command for Volume "000018L6" Slot 2 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000018L6.
> Sending label command for Volume "000021L6" Slot 3 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000021L6.
> Media record for Slot 4 Volume "CLNU00L1" already exists.
> Sending label command for Volume "000014L6" Slot 5 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000014L6.
> Sending label command for Volume "000017L6" Slot 6 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000017L6.
> Sending label command for Volume "000020L6" Slot 7 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000020L6.
> Media record for Slot 8 Volume "CLN005L3" already exists.
> Sending label command for Volume "000013L6" Slot 9 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000013L6.
> Sending label command for Volume "000016L6" Slot 10 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000016L6.
> Sending label command for Volume "000019L6" Slot 11 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000019L6.
> Sending label command for Volume "000012L6" Slot 12 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000012L6.
> Sending label command for Volume "000009L6" Slot 13 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000009L6.
> Sending label command for Volume "000006L6" Slot 14 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000006L6.
> Sending label command for Volume "000003L6" Slot 15 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000003L6.
> Sending label command for Volume "000011L6" Slot 16 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000011L6.
> Sending label command for Volume "000008L6" Slot 17 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000008L6.
> Sending label command for Volume "000005L6" Slot 18 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000005L6.
> Sending label command for Volume "000002L6" Slot 19 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000002L6.
> Sending label command for Volume "000010L6" Slot 20 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000010L6.
> Sending label command for Volume "000007L6" Slot 21 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000007L6.
> Sending label command for Volume "000004L6" Slot 22 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000004L6.
> Sending label command for Volume "000001L6" Slot 23 ...
> 3999 Device "AutochangerOdin" not found or could not be opened.
> Label command failed for Volume 000001L6.
> You have messages.
> *
> 
> From the bacula-sd.conf on odin (where the library is):
> Autochanger {
>   Name = AutochangerOdin
>   Device = Drive-1
>   Changer Command = "/etc/bacula/scripts/mtx-changer %c %o %S %a %d"
>   Changer Device = /dev/autochanger1
> }
> 
> Device {
>   Name = Drive-1                      #
>   Description = "LT06 inside Dell TL2000 Library"
>   Drive Index = 0
>   Media Type = LT06
>   Archive Device = /dev/nst0
>   AutomaticMount = yes;               # when device opened, read it
>   AlwaysOpen = yes;
>   RemovableMedia = yes;
>   RandomAccess = no;
>   AutoChanger = yes
>   SpoolDirectory = "/var/spool/bacula"
>   MaximumSpoolSize = 485G
>   Maximum Network Buffer Size = 65536
>   Offline On Unmount = no
>   Alert Command = "sh -c 'smartctl -H -l error %c'"
> }
> 
> From bacula-dir.conf on sleipnir (where the director is):
> Storage {
>   Name = Library2
> # Do not use "localhost" here
>   Address = odin                # N.B. Use a fully qualified name here
>   SDPort = 9103
>   Password = "*****************"
>   Device = AutochangerOdin
>   Media Type = LTO6
>   Autochanger = yes                   # enable for autochanger device
> }
> 
> 
> 
> 
> --
> Steve Garcia
> Ignorance killed the cat, curiosity was framed.
> 
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

ADSM.ORG Privacy and Data Security by https://kimlaw.us