Re: [BackupPC-users] Issue with remote backup of server(s) over VPN after failover
2011-02-10 19:31:39
I let the most recent backup 'finish' on its own. It becomes a
partial backup in the host backup summary page with the following
error:
Read EOF:
Tried again: got 0 bytes
finish: removing in-process file path/to/filename.ext
Can't write 4 bytes to socket
Child is aborting
Done: 229002 files, 82767774899 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Note, this is well after the default clientTimeout of 72000(secs)
and the in-process file it specified to be removing is only 114MB so
I don't think it was due to hanging on a large file.
...
|
|
|
|
|
|
full |
yes |
0 |
12/22 20:00 |
205.4 |
49.9 |
full |
yes |
0 |
12/29 21:00 |
136.2 |
42.8 |
full |
yes |
0 |
1/5 21:00 |
336.4 |
35.8 |
incr |
no |
1 |
1/10 21:00 |
0.1 |
30.8 |
incr |
no |
1 |
1/11 22:01 |
0.1 |
29.8 |
partial |
yes |
0 |
1/28 02:00 |
17136.1 |
13.6 |
Looking a little further in the past, the results of the other
node's partial backup are a little bit different:
Remote[1]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(543) [sender=3.0.7]
Can't write 32780 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file path/to/filename.ext
Child is aborting
Done: 32547 files, 30060082211 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
The file it choked on was only 25MB.
Has nobody else had issues with one of their server's remote backups
never finishing? The odd thing to me is that if I bring the remote
backup server local to take a full backup of the server, subsequent
remote backups of that server succeed (Note, other servers run
remote backups without these issues). Any help here is appreciated.
Maybe I'm just overlooking something simple, but I haven't made any
progress on this issue for some time and I've searched the mailing
list for help without finding a solution.
Could this possibly be an issue with an older version (we're running
BackupPC version 3.0.0)? Could this possibly be related to tcp
segmentation offload (set to 'on' for both backup client and backup
server)? Could it be compatibility issues between rsync versions?
The backup servers are running 2.6.9 protocol version 29 and both of
the clients are running 3.0.7 protocol version 30. AFAIK the newer
version would be backwards compatible, no? Is this setup confusing
-- have I explained the issue well enough?
Scott
On 2/7/2011 2:46 PM, Scott Saunders wrote:
I've got a couple of servers running in a 2 node master/slave cluster
using pacemaker(corosync)/drbd. Like other servers, I've got them
configured to backup to a local BackupPC server as well as a remote (VPN
over T1) BackupPC server (rsync over ssh for both). However, with the
cluster, only the master node has the partition mounted that is to be
backed up, so the backups for the slave node will always fail. This is
ok, but maybe there is a better way to do this? Anyway, to get the
backups started I brought the remote backup server local to take a full
backup (because ~300GB). After a fail over of the master node to the
slave node the slave becomes the new master, gets the partition mounted
and thus has something to backup. The local backups work without a
problem on the new master. The remote backups act like they are working
on the new master, but never actually finish. I've let them go more than
a week, which is well past the default client timeout which has actually
never taken effect with these two boxes. This erroneous behavior
persists when failing back over to the original master. The only way I
get the remote backups going again is to bring the remote server local
for a full backup. Any subsequent remote backups work after this until a
fail over of the cluster occurs. Remote backups for other servers in the
past have been performed without these issues. Any ideas as to why there
are issues with the remote backup in this setup? And what I might try to
get the backups running again on the master node after a fail over
without having to bring the remote server local every time?
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
|
|
|