BackupPC-users

Re: [BackupPC-users] Issue with remote backup of server(s) over VPN after failover

2011-02-10 19:31:39
Subject: Re: [BackupPC-users] Issue with remote backup of server(s) over VPN after failover
From: Scott Saunders <ssaunders AT asphaltzipper DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 10 Feb 2011 17:29:36 -0700
I let the most recent backup 'finish' on its own. It becomes a partial backup in the host backup summary page with the following error:
Read EOF: 
Tried again: got 0 bytes
finish: removing in-process file path/to/filename.ext
Can't write 4 bytes to socket
Child is aborting
Done: 229002 files, 82767774899 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Note, this is well after the default clientTimeout of 72000(secs) and the in-process file it specified to be removing is only 114MB so I don't think it was due to hanging on a large file.

Type Filled Level Start Date Duration/mins Age/days
...





full yes 0 12/22 20:00 205.4 49.9
full yes 0 12/29 21:00 136.2 42.8
full yes 0 1/5 21:00 336.4 35.8
incr no 1 1/10 21:00 0.1 30.8
incr no 1 1/11 22:01 0.1 29.8
partial yes 0 1/28 02:00 17136.1 13.6

Looking a little further in the past, the results of the other node's partial backup are a little bit different:
Remote[1]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(543) [sender=3.0.7]
Can't write 32780 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file path/to/filename.ext
Child is aborting
Done: 32547 files, 30060082211 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
The file it choked on was only 25MB.

Has nobody else had issues with one of their server's remote backups never finishing? The odd thing to me is that if I bring the remote backup server local to take a full backup of the server, subsequent remote backups of that server succeed (Note, other servers run remote backups without these issues). Any help here is appreciated. Maybe I'm just overlooking something simple, but I haven't made any progress on this issue for some time and I've searched the mailing list for help without finding a solution.

Could this possibly be an issue with an older version (we're running BackupPC version 3.0.0)? Could this possibly be related to tcp segmentation offload (set to 'on' for both backup client and backup server)? Could it be compatibility issues between rsync versions? The backup servers are running 2.6.9 protocol version 29 and both of the clients are running 3.0.7 protocol version 30. AFAIK the newer version would be backwards compatible, no? Is this setup confusing -- have I explained the issue well enough?

Scott

On 2/7/2011 2:46 PM, Scott Saunders wrote:
I've got a couple of servers running in a 2 node master/slave cluster 
using pacemaker(corosync)/drbd. Like other servers, I've got them 
configured to backup to a local BackupPC server as well as a remote (VPN 
over T1) BackupPC server (rsync over ssh for both). However, with the 
cluster, only the master node has the partition mounted that is to be 
backed up, so the backups for the slave node will always fail. This is 
ok, but maybe there is a better way to do this? Anyway, to get the 
backups started I brought the remote backup server local to take a full 
backup (because ~300GB). After a fail over of the master node to the 
slave node the slave becomes the new master, gets the partition mounted 
and thus has something to backup. The local backups work without a 
problem on the new master. The remote backups act like they are working 
on the new master, but never actually finish. I've let them go more than 
a week, which is well past the default client timeout which has actually 
never taken effect with these two boxes. This erroneous behavior 
persists when failing back over to the original master. The only way I 
get the remote backups going again is to bring the remote server local 
for a full backup. Any subsequent remote backups work after this until a 
fail over of the cluster occurs. Remote backups for other servers in the 
past have been performed without these issues. Any ideas as to why there 
are issues with the remote backup in this setup? And what I might try to 
get the backups running again on the master node after a fail over 
without having to bring the remote server local every time?

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/