BackupPC-users

Re: [BackupPC-users] Got fatal error during xfer

2009-12-03 12:27:43
Subject: Re: [BackupPC-users] Got fatal error during xfer
From: Kameleon <kameleon25 AT gmail DOT com>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Thu, 3 Dec 2009 11:25:31 -0600
Update:

We moved the backuppc server to the same room as the client server and on the same switch. The backup still failed, but got alot further. The error is the same as last time:

Read EOF:
Tried again: got 0 bytes
Child is aborting
Can't write 33792 bytes to socket
Sending csums, cnt = 250243, phase = 1
Done: 26416 files, 31698685174 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Saving this as a partial backup


So I ran the exact command as the backuppc user that it uses according to the log file and ran it manually. Then I straced the pid on the client machine and got this:

select(1, [0], [], NULL, {45, 57000})   = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 0 (Timeout)
select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {29, 170000})
read(0, "", 4)                          = 0
write(2, "rsync: connection unexpectedly c"..., 72) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigaction(SIGUSR1, {0x1, [], 0}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {0x1, [], 0}, NULL, 8) = 0
write(2, "rsync error: errors with program"..., 83) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
rt_sigaction(SIGUSR1, {0x1, [], 0}, NULL, 8) = 0
rt_sigaction(SIGUSR2, {0x1, [], 0}, NULL, 8) = 0
gettimeofday({1259860374, 861962}, NULL) = 0
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
gettimeofday({1259860374, 961981}, NULL) = 0
exit_group(13)                          = ?
Process 13532 detached

It still appears the problem is on the remote server since it is exiting. The client server is a Dell poweredge so I would hope it wasn't hardware related. Anything else I can check before I give it a swift kick in the pants?

Donny B.

On Wed, Dec 2, 2009 at 3:56 PM, Kameleon <kameleon25 AT gmail DOT com> wrote:
I did some more testing watching top and such on both backuppc server and the client. Both had plenty of memory and such. The thing we are thinking is possibly the cable or switch between the two. Tomorrow we plan to relocate the backuppc server from it's current location and plug it directly in via crossover cable to the server. That will eliminate the networking gear all except the network intercafe cards.

Thanks for the input.



On Wed, Dec 2, 2009 at 3:10 PM, Les Mikesell <lesmikesell AT gmail DOT com> wrote:
Kameleon wrote:
> I do apologize. The backuppc server is Ubuntu 9.10 and the server being
> backed up is Centos 5.4. I have changed everything back to rsync and
> tried a manual full backup (since that is what it was attempting to do
> when it failed) I ran strace on the PID of rsync on the remote server
> being backed up. The last few lines of the output are below.
[...]
>  read(0,
> "\256\374O\362\350\224\30\3101(Y\"8\3279z\300nt\10*\367\26+\355\364\245W)/\224\301"...,
> 8184) = 8184
> select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {60, 0})
> read(0,
> "\314\326\4\242P\345\3\332\245b\317\363\4\253'\307\3056Y\307X\313\364I\5\3746\fH\340\212w"...,
> 8184) = 1056
> select(1, [0], [], NULL, {60, 0})       = 1 (in [0], left {58, 296000})
> read(0, "", 8184)                       = 0
> select(2, NULL, [1], [1], {60, 0})      = 1 (out [1], left {60, 0})
> write(1, "O\0\0\10rsync: connection unexpected"..., 83) = -1 EPIPE
> (Broken pipe)

Looks like something is wrong on the target side, dropping the
connection.  File system problems?  Out of memory?  Are both machines on
the same LAN or could there be a problem with networking equipment
between them?


> Backuppc shows the following error when it fails:
>
> 2009-12-02 14:17:58 full backup started for directory /; updating partial #4
> 2009-12-02 14:24:59 Aborting backup up after signal PIPE
> 2009-12-02 14:25:00 Got fatal error during xfer (aborted by signal=PIPE)

This doesn't tell you anything except that the other end died.


> remote machine: rsync  version 3.0.6  protocol version 30
> backuppc: rsync  version 3.0.6  protocol version 30

Backuppc doesn't use the rsync binary on the server side - it has its
own implementation in perl.  But it looks like things started OK and
then either the remote side quite or the network connection had a problem.


--
  Les Mikesell
   lesmikesell AT gmail DOT com

------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing.
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/