Anyone have any thoughts on
this?
From:
bacula-users-bounces AT lists.sourceforge DOT net
[mailto:bacula-users-bounces AT lists.sourceforge DOT net] On Behalf Of Matthew
Ife
Sent: 12 September 2008 14:39
To: bacula-users AT lists.sourceforge DOT net
Subject: [Bacula-users] Trunking Bacula-Dir/Bacula-SD on the same layer
3 network.
Hi Guys.
We have an unusual bacula network setup and after spending a
few weeks playing around with configuration after configuration I have finally
hit a stumbling block I am unable to work out.
Firstly I apologize for the length of this but I believe the
history of what I have tried is relevent to what I am trying to achieve.
We offer backup services for our clients and our clients all
have different needs, some are attached to a shared firewall whilst some are
not. The shared firewall runs transparently and we control access through it by
placing customers who are on the firewall on one vlan, and customers who are
not on the firewall on another vlan.
We have since doing this recognised numerous connectivity
issues which is caused by lots of backup traffic passing through the firewall
eventually causing connections to be dropped and backups to fail, worse still
for us all this backup traffic running through a firewall causes loss of
service, even for our non backed up clients.
In order to avoid this we have placed backup servers on the
same layer 3 network as our client machines as only traffic destined for the
gateway is directed through the firewall but we were still seing connectivity
issues.
Both firewalled and none firewalled customer reside on the
same layer 3 network but do not reside on the same layer 2 network and thus
ARPING the backup server doesn’t always work if your client is on the wrong
broadcast domain (the firewalled vlan). So even backup machines connected to
the same switch on the same layer 3 network was running out of the gateway,
through the firewall and back down the firewall to the backup server.
In order to resolve this we decided to trunk the backup
server to belong on BOTH vlans, in theory this would work but in practice we
have had problems.
So, now the backup server has two vlans, two interfaces and
two IP address, one resides on vlan A another resides on vlan B. To
prevent traffic going through any firewalls I dynamically configured our build
script to create bacula configs which used interface A if your machine resides
on vlan A and interface B if it resides on vlan B. After a few days of testing
it soon became apparent we were seeing the same connectivity issues, but this
was not totally the same. We were getting lots of "Authentication
Failed" messages even though authentication was definitely correct.
Further investigation revealed that due to both vlanned IPs
being on the same network, the routing tables were not always honouring the
right interface to send traffic down (in fact, it as always sending down one. I
think sometimes it would send down the other and as the source/dest ip would
mismatch it would throw out an authentication error intermittently). So, I set
out to work once again and this time I setup the routing tables using advanced
routing (multiple routing tables which made decisions based on the source IP
address). This time I could definitely confirm that I was sending data down the
right interface through the correct vlan. But this time all the backups failed
on the server! I received the following error "Fatal error: Bad response
from stored to open command".
Thus I am still unable to send backup traffic down the right
interface using a trunked vlan.
Our general configuration is as follows:
bacula-dir and bacula-sd reside on the same server.
client machines are generally all connected to the same
switch.
the backup server is trunked so that we avoid passing bacula
traffic through the firewall. (well, that’s the intention!)
Any advise which helps me set this up correctly would be
great.
Ultimately we need bacula to send traffic down the right
interface to the right client without causing problems. We cannot centralize
our storage daemon since the sheer number of customers we are backing up at one
point makes it unfeasible bandwidth wise and we are not keen on generating
large quantities of inter-switched backup traffic. We are restricted to what
times we can run backups because traditionally backing up a server for all
intents and purposes causes loss of other services because bacula uses up large
quantities of customers outbound traffic.
My most immediate questions are:
The authentication error we have experienced - we suspect
this is something to do with how bacula keeps authentication "tokens"
of each client. If traffic suddenly comes through the wrong interface (i.e the
interface it DIDN’T authenticate on) bacula requires re-authentication. This
can be caused because there are two routing rules in the routing table for each
vlan but a since both match it nearly always chooses the first. Sometimes it
seems it chooses the second. Can you shed any light on this?
When properly sorting out routing rules so that traffic has
to go down the right interface, bacula fails immediately and consistently with
"Fatal error: Bad response from stored to open command". What does
this mean and how can we fix it? Why would this appear when forcing traffic
down one interface?
Does bacula handle multiple interfaces better where IP
addresses are on different networks instead of the same?
Below is a list of routing rules we have setup.
[root@163 ~]# ip rule ls
0: from all lookup 255
32765: from 78.109.163.75 lookup vlan64
32766: from all lookup main
32767: from all lookup default
Please note, default implies vlan 63.
[root@163 ~]# ip route show vlan64
78.109.163.0/24 dev eth0.64 scope link
default via 78.109.163.3 dev eth0.64
[root@163 ~]# ip route show table main
78.109.163.0/24 dev eth0.63 proto kernel scope
link src 78.109.163.186
169.254.0.0/16 dev eth0.64 scope link
default via 78.109.163.3 dev eth0.63
#########Example of a firewalled client.##########
FileSet {
Name = "78.109.163.116 Full Set"
Include {
Options {
wildfile = "*pagefile.sys"
wildfile = "*.log"
exclude = yes
signature = MD5
_onefs_ = no
fstype = ntfs
}
File = C:/
File = D:/
}
}
Client {
Name = srv-78_109_163_116
Address = 163.116.srvlist.ukfast.net
FDPort = 9102
Catalog = MyCatalog
Password = "JUbuVYFC"
File Retention = 7 days
Job Retention = 7 days
AutoPrune = yes
}
Storage {
Name = file-78_109_163_116
Address = 163.186.srvlist.ukfast.net
SDPort = 9103
Password =
"XXXXXXX"
#This is the password for the director to the SD - Don't get confused
Device = storage-78_109_163_116
Media Type = File
}
JobDefs {
Name = "78_109_163_116 Job"
Type = Backup
Level = Incremental
Client = srv-78_109_163_116
FileSet = "78.109.163.116 Full Set"
Schedule = "WeeklyCycle"
Storage = file-78_109_163_116
Messages = Standard
Pool = pool-78_109_163_116
Priority = 10
}
Job {
Name = "78_109_163_116 Job"
JobDefs = "78_109_163_116 Job"
ClientRunBeforeJob = "C:/windows/sysstate.bat"
Write Bootstrap =
"/home/bacula/bootstraps/srv-78_109_163_116.bsr"
}
Pool {
Name = pool-78_109_163_116
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Recycle = yes
Maximum Volumes = 5
Maximum Volume Jobs = 7
Maximum Volume Bytes = 5g
VolumeRetention = 7d
Volume Use Duration = 0
LabelFormat = "srv-78_109_163_116-"
}
##########Example of a non-firewalled client#############
FileSet {
Name = "78.109.163.43 Full Set"
Include {
Options {
signature = MD5
_onefs_ = no
fstype = ext2
Exclude = yes
wildfile =
"*access.log"
wildfile = "*access.log.*.*.gz"
wildfile = "*access.log.*.*"
wildfile = "*error.log"
}
File = /
}
Exclude {
File = /proc
File = /tmp
File = /.journal
File = /.fsck
File = /sys
}
}
Client {
Name = srv-78_109_163_43
Address = 163.43.srvlist.ukfast.net
FDPort = 9102
Catalog = MyCatalog
Password = "oL3HxG38"
File Retention = 7 days
Job Retention = 7 days
AutoPrune = yes
}
Storage {
Name = file-78_109_163_43
Address =
163.75.srvlist.ukfast.net #we send traffic down
the non firewalled vlan
SDPort = 9103
Password =
"XXXXXXX"
#This is the password for the director to the SD - Don't get confused
Device = storage-78_109_163_43
Media Type = File
}
JobDefs {
Name = "78_109_163_43 Job"
Type = Backup
Level = Incremental
Client = srv-78_109_163_43
FileSet = "78.109.163.43 Full Set"
Schedule = "WeeklyCycle"
Storage = file-78_109_163_43
Messages = Standard
Pool = pool-78_109_163_43
Priority = 10
}
Job {
Name = "78_109_163_43 Job"
JobDefs = "78_109_163_43 Job"
Write Bootstrap = "/home/bacula/bootstraps/srv-78_109_163_43.bsr"
}
Pool {
Name = pool-78_109_163_43
Pool Type = Backup
Recycle = yes
AutoPrune = yes
Recycle = yes
Maximum Volumes = 5
Maximum Volume Jobs = 7
Maximum Volume Bytes = 5g
VolumeRetention = 7d
Volume Use Duration = 0
LabelFormat = "srv-78_109_163_43-"
}
Any information you can provide would be very helpful or if
you need more infomation from me please let me know.