Hi guys,
For my company I've been trying to get bacula up and running properly.
My currect situation:
Host 'leiden' :
Located at my home, multiple large (8TB) raid arrays attached.
Therefore running bacula-sd and bacula-dir.
>100mbit download bandwidth.
Running debian testing, bacula version 5.0.3.
Multiple hosts to be backed up, on a 100/100 connection.
debian stable, bacula 5.0.3
running bacula-fd, default config.
The complete bacula-dir.conf is located at: http://pastebin.com/8JvCdmL9
Please note that I have substituted all passwords by an X.
Relevant parts are:
Director { # define myself
Name = leiden-dir
QueryFile = "/etc/bacula/scripts/query.sql"
WorkingDirectory = "/var/lib/bacula"
PidDirectory = "/var/run/bacula"
Maximum Concurrent Jobs = 10
Password = "X" # Console password
Messages = Daemon
DirAddresses = {
ip = { addr = 192.168.1.44; port = 9101 }
ip = { addr = 127.0.0.1; port =9101 }
}
}
JobDefs {
Name = "sql-weekly"
Type = Backup
Level = Incremental
Client = sql
FileSet = "Full Set"
Schedule = "WeeklyCycle"
Storage = leiden-filestorage
Messages = Standard
Pool = LeidenPool
Priority = 10
}
JobDefs {
Name = "mail-weekly"
Type = Backup
Level = Incremental
Client = mail
FileSet = "Full Set"
Schedule = "WeeklyCycle"
Storage = leiden-filestorage
Messages = Standard
Pool = LeidenPool
Priority = 10
}
Job {
Name = "sqljob"
JobDefs = "sql-weekly"
Write Bootstrap = "/var/lib/bacula/sql.bsr"
}
Job {
Name = "mailjob"
JobDefs = "mail-weekly"
Write Bootstrap = "/var/lib/bacula/mail.bsr"
}
# Client (File Services) to backup
Client {
Name = sql
Address = sql.boudewijnector.nl
FDPort = 9102
Catalog = MyCatalog
Password = "X" # password for FileDaemon
File Retention = 30 days # 30 days
Job Retention = 6 months # six months
AutoPrune = yes # Prune expired Jobs/Files
}
Client {
Name = mail
Address = mail.boudewijnector.nl
FDPort = 9102
Catalog = MyCatalog
Password = "X" # password for FileDaemon
File Retention = 30 days # 30 days
Job Retention = 6 months # six months
AutoPrune = yes # Prune expired Jobs/Files
}
The current problem is that I get errors on some hosts, such as:
17-Jul 02:52 leiden-dir JobId 94: Fatal error: Network error with FD
during Backup: ERR=Connection reset by peer
17-Jul 02:52 leiden-dir JobId 94: Fatal error: No Job status returned
from FD.
17-Jul 02:52 leiden-dir JobId 94: Error: Bacula leiden-dir 5.0.3
(04Aug10): 17-Jul-2011 02:52:30
Build OS: i486-pc-linux-gnu debian wheezy/sid
JobId: 94
Job: BLAjob.2011-07-17_00.52.14_10
Backup Level: Full (upgraded from Incremental)
Client: "client4" 5.0.2 (28Apr10)
x86_64-pc-linux-gnu,debian,squeeze/sid
FileSet: "Home Set" 2011-07-16 23:49:43
Pool: "LeidenPool" (From Job resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "leiden-filestorage" (From Job resource)
Scheduled time: 17-Jul-2011 00:52:13
Start time: 17-Jul-2011 00:52:16
End time: 17-Jul-2011 02:52:30
Elapsed time: 2 hours 14 secs
Priority: 10
FD Files Written: 0
SD Files Written: 137,033
FD Bytes Written: 0 (0 B)
SD Bytes Written: 3,586,674,915 (3.586 GB)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s): LeidenVol0005
Volume Session Id: 20
Volume Session Time: 1310599400
Last Volume Bytes: 12,025,925,394 (12.02 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: Error
SD termination status: OK
Termination: *** Backup Error ***
When trying to rerun the job it also fails after 2 hours.... I tried to
fix it this way:
In the Job @ bacula-dir , I added "Max Run Time = 144000" because it
seemed like bacula shut down the connection after 2 hours.
I also changed the keep-alive time on the machine running bacula-dir :
sysctl -w net.ipv4.tcp_keepalive_time=60
When I did so, it failed completely:
Elapsed time: 15 hours 22 mins 58 secs
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s):
Volume Session Id: 33
Volume Session Time: 1310599400
That's really bad, my router did not detect any traffic at all except
for some bytes when setting up the connection.
Can someone please point me out where I should start to investigate this
problem?
From the internet, I can reach the director and the SD @ the 'leiden'
system.
I can reach the FD's at all servers which are to be backed up.
Cheers,
Boudewijn Ector
------------------------------------------------------------------------------
Got Input? Slashdot Needs You.
Take our quick survey online. Come on, we don't ask for help often.
Plus, you'll get a chance to win $100 to spend on ThinkGeek.
http://p.sf.net/sfu/slashdot-survey
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|