Hi Y'all,
We have two RHEL 6 servers, one is running bacula-dir and bacula-sd
(with a lot of disk storage for the backups), and the other is the
client we want to back up from running bacula-fd.
Both servers are behind different firewalls, and each has a public IP
address that is configured for 1:1 NAT to the internal server. I can
telnet from the director to the client on port 9102 and connect OK, and
the client can telnet to the storage daemon on the other server on port
9103 and connect OK. So, I think the network path is working.
What appears to happen is that we start the backup job, and it connects
and begins transferring and then almost immediately stops and then times
out. It transfers like 800 bytes and seems to just sit there. This is
what it looks like on the different status reports from bconsole on the
directory node:
DIRECTOR STATUS:
*status
Status available for:
1: Director
2: Storage
3: Client
4: All
Select daemon type for status (1-4): 1
bacula-dir Version: 5.0.0 (26 January 2010) x86_64-unknown-linux-gnu
redhat (Final)
Daemon started 15-Jun-12 09:39, 0 Jobs run since started.
Heap: heap=135,168 smbytes=61,022 max_bytes=61,278 bufs=247 max_bufs=253
Scheduled Jobs:
Level Type Pri Scheduled Name Volume
===================================================================================
Incremental Backup 10 16-Jun-12 02:00 hubcon02
Default0002
====
Running Jobs:
Console connected at 15-Jun-12 09:52
JobId Level Name Status
======================================================================
19 Full hubcon02.2012-06-15_09.53.03_03 is running
====
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
====================================================================
4 Full 0 0 Error 13-Jun-12 14:41 hubcon02
5 Full 0 0 Error 13-Jun-12 15:40 hubcon02
8 Full 0 0 Error 13-Jun-12 22:10 hubcon02
10 Full 10 173.5 K Error 13-Jun-12 22:56 hubcon02
11 Full 0 0 Error 13-Jun-12 23:06 hubcon02
13 Full 0 0 Error 14-Jun-12 01:30 hubcon02
14 Full 10 107.9 K Error 14-Jun-12 02:15 hubcon02
15 Full 0 0 Error 15-Jun-12 04:11 hubcon02
16 Full 10 108.2 K Cancel 15-Jun-12 08:05 hubcon02
17 Full 10 108.2 K Error 15-Jun-12 08:22 hubcon02
====
*
STORAGE STATUS:
*status
Status available for:
1: Director
2: Storage
3: Client
4: All
Select daemon type for status (1-4): 2
The defined Storage resources are:
1: storage-hubcon01
2: storage-hubcon02
3: storage-app01
Select Storage resource (1-3): 2
Connecting to Storage daemon storage-hubcon02 at 128.114.61.132:9103
bacula-sd Version: 5.0.0 (26 January 2010) x86_64-unknown-linux-gnu
redhat (Final)
Daemon started 15-Jun-12 09:24, 0 Jobs run since started.
Heap: heap=131,072 smbytes=304,519 max_bytes=304,713 bufs=149 max_bufs=151
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8
Running Jobs:
Writing: Full Backup job hubcon02 JobId=19 Volume="Default0002"
pool="Default" device="backup-hubcon02" (/data/backup/backup-hubcon02)
spooling=0 despooling=0 despool_wait=0
Files=4 Bytes=52,038 Bytes/sec=634
FDReadSeqNo=32 in_msg=23 out_msg=5 fd=8
====
Jobs waiting to reserve a drive:
====
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
===================================================================
8 Full 0 0 Cancel 13-Jun-12 22:10 hubcon02
9 Full 2 631 Cancel 13-Jun-12 22:32 hubcon02
10 Full 4 51.80 K Cancel 13-Jun-12 23:05 hubcon02
11 Full 0 0 Error 13-Jun-12 23:06 hubcon02
12 Full 0 0 Cancel 13-Jun-12 23:18 hubcon02
13 Full 2 631 Error 14-Jun-12 01:19 hubcon02
14 Full 2 631 Cancel 14-Jun-12 02:15 hubcon02
15 Full 2 860 Error 15-Jun-12 04:00 hubcon02
16 Full 2 860 Cancel 15-Jun-12 08:05 hubcon02
17 Full 2 860 Cancel 15-Jun-12 08:22 hubcon02
====
Device status:
Device "backup-hubcon01" (/data/backup/backup-hubcon01) is not open.
Device "backup-hubcon02" (/data/backup/backup-hubcon02) is mounted with:
Volume: Default0002
Pool: Default
Media type: File
Total Bytes=215 Blocks=0 Bytes/block=215
Positioned at File=0 Block=215
Device "backup-app01" (/data/backup/backup-app01) is not open.
====
Used Volume status:
Default0002 on device "backup-hubcon02" (/data/backup/backup-hubcon02)
Reader=0 writers=2 devres=0 volinuse=1
====
====
*
CLIENT STATUS:
*status
Status available for:
1: Director
2: Storage
3: Client
4: All
Select daemon type for status (1-4): 3
The defined Client resources are:
1: hubcon01-fd
2: hubcon02-fd
3: app01-fd
Select Client (File daemon) resource (1-3): 2
Connecting to Client hubcon02-fd at bacula-fd.mydomain.com:9102
hubcon02-fd Version: 5.0.0 (26 January 2010) x86_64-unknown-linux-gnu
redhat Enterprise 6.0
Daemon started 15-Jun-12 09:26, 1 Job run since started.
Heap: heap=0 smbytes=167,245 max_bytes=167,392 bufs=116 max_bufs=117
Sizeof: boffset_t=8 size_t=8 debug=0 trace=0
Running Jobs:
JobId 19 Job hubcon02.2012-06-15_09.53.03_03 is running.
Full Backup Job started: 15-Jun-12 09:53
Files=10 Bytes=173,753 Bytes/sec=1,368 Errors=0
Files Examined=10
Processing file: /root/.gstreamer-0.10/registry.x86_64.bin
SDReadSeqNo=5 fd=5
Director connected at: 15-Jun-12 09:55
====
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
======================================================================
6 Full 10 107.9 K Error 13-Jun-12 21:46 hubcon02
9 Full 10 107.9 K Cancel 13-Jun-12 22:32 hubcon02
10 Full 10 173.5 K Error 13-Jun-12 22:55 hubcon02
13 Full 10 107.9 K Error 13-Jun-12 23:35 hubcon02
12 Full 0 0 Error 14-Jun-12 01:19 hubcon02
14 Full 10 107.9 K Error 14-Jun-12 02:15 hubcon02
15 Full 10 108.2 K Error 15-Jun-12 02:16 hubcon02
16 Full 10 108.2 K Error 15-Jun-12 08:05 hubcon02
17 Full 10 108.2 K Error 15-Jun-12 08:22 hubcon02
18 Full 10 108.2 K Error 15-Jun-12 09:43 hubcon02
====
*
So, the storage says it transferred 52,038 bytes almost immediately
after the job started and then transferred no more, I check for a while
after. Sometimes it tranfers only 800 bytes then just waits.
Eventually the job times out with:
15-Jun 02:00 bacula-dir JobId 15: No prior Full backup Job record found.
15-Jun 02:00 bacula-dir JobId 15: No prior or suitable Full backup found in
catalog. Doing FULL backup.
15-Jun 02:00 bacula-dir JobId 15: Start Backup JobId 15,
Job=hubcon02.2012-06-15_02.00.00_10
15-Jun 02:00 bacula-dir JobId 15: Using Device "backup-hubcon02"
15-Jun 02:00 bacula-sd JobId 15: Volume "Default0002" previously written,
moving to end of data.
15-Jun 02:00 bacula-sd JobId 15: Ready to append to end of Volume "Default0002"
size=215
15-Jun 04:00 bacula-sd JobId 15: Fatal error: append.c:242 Network error
reading from FD. ERR=Connection reset by peer
15-Jun 04:00 bacula-sd JobId 15: Job write elapsed time = 02:00:00, Transfer
rate = 0 Bytes/second
15-Jun 04:00 bacula-sd JobId 15: Error: bsock.c:518 Read error from
client:111.21.21.21:36643: ERR=Connection reset by peer
15-Jun 04:11 bacula-dir JobId 15: Fatal error: Network error with FD during
Backup: ERR=Connection timed out
15-Jun 04:11 bacula-dir JobId 15: Fatal error: No Job status returned from FD.
15-Jun 04:11 bacula-dir JobId 15: Error: Bacula cghub-bacula-sc-dir 5.0.0
(26Jan10): 15-Jun-2012 04:11:21
It *does* seem like the FD can connect back to the DIR and SD, so I don't know
why it would stop transferring immediately after starting and then eventually
time out. I've confirmed my firewalls are not timing out the connections. I
also have "Heartbeat Interval = 60" in the SD, DIR and FD config files. But
the reason I don't think it's a firewall timeout is because it almost
immediately stops transferring after only like 1 second.
Anyone have any ideas as to what is happening?
Thanks in advance!!
-erich
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|