Bacula-users

Re: [Bacula-users] Fatal error: Authorization key rejected by Storage daemon.

2013-09-21 18:17:31
Subject: Re: [Bacula-users] Fatal error: Authorization key rejected by Storage daemon.
From: "Kevin B. Zimmerman" <kevin.zimmerman AT kitware DOT com>
To: "Ana Emília M. Arruda" <emiliaarruda AT gmail DOT com>
Date: Sat, 21 Sep 2013 18:13:54 -0400
Hi Ana -

Thanks for responding.

I currently have no issues communicating between the public-fd client and the storage daemon at 192.168.120.35.  I'm able to connect between both addresses on ports 9102 and 9103.

As far as the JobID, I pass the specific job to the console during the restore.  What I'm confused about is why a disk volume ties itself to one specific storage resource when there are numerous clients from different VLANs.  Ultimately, I don't really care where it looks to get it's data from, but it was something that didn't make sense to me.

Did I answer your questions?  Any other thoughts?

Thanks,
Kevin B. Zimmerman

On 09/20/2013 01:31 PM, Ana Emília M. Arruda wrote:
Hi Kevin,

Just trying to understand... The error bellow:

Restore-public-Disk.2013-09-20_12.43.48_32 is waiting
for Client to connect to Storage daemon
...
20-Sep 12:19 public-fd JobId 19084: Fatal error: Authorization key rejected by Storage daemon.
Please seehttp://www.bacula.org/rel-manual/faq.html#AuthorizationErrors  for help.
20-Sep 12:19 public-fd JobId 19084: Fatal error: Failed to authenticate Storage daemon.

It means that your client "public-fd" have problems to connect to storage daemon. In the case, for the restore job, bacula is asking for:

Storage {
   Name = File_20
   Address = 192.168.120.35 
...
}

So, probably, public-fd has network problems in communicating with 192.168.120.35...

Are you using the correct JobId for the restore? When you say that it should ask for File_5 storage, maybe you are misunderstanding the JobId information for the restore you want.

Best regards,
Ana





On Fri, Sep 20, 2013 at 2:09 PM, Kevin B. Zimmerman <kevin.zimmerman AT kitware DOT com> wrote:
Greetings -

I'm running Bacula 5.2.5 on an Ubuntu 12.04.3 server, with about 61TB of
disk storage and an attached Dell PVTL2000 tape library. Backups are
working great, but restores are presenting a problem.

Here's my dilemma:

When attempting to restore a file, I get this status for about 10 minutes:

  JobId Level   Name                       Status
======================================================================
  19085         Restore-public-Disk.2013-09-20_12.43.48_32 is waiting
for Client to connect to Storage daemon

and then this error:

20-Sep 12:09 khq-backups2-dir JobId 19084: Start Restore Job Restore-public-Disk.2013-09-20_12.09.20_02
20-Sep 12:09 khq-backups2-dir JobId 19084: Using Device "DiskStorage"
20-Sep 12:19 public-fd JobId 19084: Fatal error: Authorization key rejected by Storage daemon.
Please seehttp://www.bacula.org/rel-manual/faq.html#AuthorizationErrors  for help.
20-Sep 12:19 public-fd JobId 19084: Fatal error: Failed to authenticate Storage daemon.
20-Sep 12:19 khq-backups2-dir JobId 19084: Fatal error: Socket error on Storage command: ERR=No data available
20-Sep 12:19 khq-backups2-dir JobId 19084: Error: Bacula khq-backups2-dir 5.2.5 (26Jan12):
   Build OS:               x86_64-pc-linux-gnu ubuntu 12.04
   JobId:                  19084
   Job:                    Restore-public-Disk.2013-09-20_12.09.20_02
   Restore Client:         public-fd
   Start time:             20-Sep-2013 12:09:22
   End time:               20-Sep-2013 12:19:33
   Files Expected:         1
   Files Restored:         0
   Bytes Restored:         0
   Rate:                   0.0 KB/s
   FD Errors:              1
   FD termination status:
   SD termination status:  Waiting on FD
   Termination:            *** Restore Error ***

20-Sep 12:19 khq-backups2-dir JobId 19084: Error: Bacula khq-backups2-dir 5.2.5 (26Jan12):
   Build OS:               x86_64-pc-linux-gnu ubuntu 12.04
   JobId:                  19084
   Job:                    Restore-public-Disk.2013-09-20_12.09.20_02
   Restore Client:         public-fd
   Start time:             20-Sep-2013 12:09:22
   End time:               20-Sep-2013 12:19:33
   Files Expected:         1
   Files Restored:         0
   Bytes Restored:         0
   Rate:                   0.0 KB/s
   FD Errors:              2
   FD termination status:
   SD termination status:  Waiting on FD
   Termination:            *** Restore Error ***

So I went to check out
http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors as
suggested by the error, but it doesn't exist.  I looked at
http://www.bacula.org/5.2.x-manuals/en/problems/problems/Bacula_Frequently_Asked_Que.html#SECTION00260000000000000000
presuming it to be the same, but it didn't help.

I'm not sure why it fails on auth.  It uses the same password to run
backups, and those work fine.
I tested connectivity between the two systems, and both are able to talk
to each other (port 9102 on client, ports 9102 & 9103 on each of the
server's many IPs).
Max Concurrent Jobs are set to 7 in all configs, and no other jobs are
running.

One thing I noticed was that after selecting the file(s) to restore, I
get this:

The job will require the following
    Volume(s)                 Storage(s)                SD Device(s)
===========================================================================

     DiskPool0197              File_20                   DiskStorage

This particular client uses File_5, which may be part of the problem,
though I don't understand why.  That volume (DiskPool0197) has backups
from many different clients, from many different VLANs.

Some relevant info:

In trying to reduce network overhead, the backup server has been added
to each of the VLANs where the Bacula clients reside, resulting in
numerous file and tape storage definitions.  Here's a subset of the file
definitions:

Storage {
   Name = File_5
   Address = 192.168.115.35
   SDPort = 9103
   Password = "<redacted>"
   Device = DiskStorage
   Media Type = File
   Maximum Concurrent Jobs = 7
}

Storage {
   Name = File_20
   Address = 192.168.120.35      # N.B. Use a fully qualified name here
   SDPort = 9103
   Password = "<redacted>"
   Device = DiskStorage
   Media Type = File
   Maximum Concurrent Jobs = 7
}

The problem has not been limited to a specific VLAN.
I've tried changing all of the Storage definitions to use the same IP
wondering if Bacula was confused about which IP to send traffic out, to
no avail.
I've tried turning on debugging, but got no output.

No idea what I'm missing here. What other info did I fail to provide?
Any thoughts on what's going on?

Thanks,

--
Kevin B. Zimmerman



------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/22/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=64545871&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users