BackupPC-users

Re: [BackupPC-users] BackupPC fails with aborted by signal=PIPE, Can't write to socket

2017-01-18 20:06:09
Subject: Re: [BackupPC-users] BackupPC fails with aborted by signal=PIPE, Can't write to socket
From: John Spitzer <johned9999 AT comcast DOT net>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Wed, 18 Jan 2017 17:03:09 -0800
After much debugging, using strace and looking at logs, I think I'm on a better track about what may be happening. I sincerely welcome it if anyone reading this has more inside info about how BackupPC might be contributing to this.

Based on the strace and logs, I've come to lean in the direction that rsync is very involved in this. The backup I'm creating has quite a few very large files. On the order of 20 - 50 GBytes in size. These are VirtualBox VMs disk files. The backup is being made to a NAS where the network share is mounted on the backuppc server.

The indications from the rsync log is:
--------
2017/01/17 21:34:34 [3936] connect from localhost (127.0.0.1)
2017/01/17 21:34:34 [3936] rsync on . from backuppc@localhost (127.0.0.1)
2017/01/17 21:34:34 [3936] building file list
2017/01/17 22:13:02 [3936] [sender] io timeout after 302 seconds -- exiting
2017/01/17 22:13:02 [3936] rsync error: timeout in data send/receive (code 30) at io.c(195) [sender=3.1.0]
----------

As far as I can tell, BackupPC has at least two pipes open, a read pipe and write pipe , to the backup archive (plus other sockets, files, etc). It looks like the write pipe has timed out after about 300 seconds. This time is very consistent in multiple runs. This timeout occurs while the read pipe is executing MANY reads from the file being backed up. (I'm guessing to determine if the file in the backup is the same as the one being backed up.) Because the read is taking so long, the write pipe has no activity and is timing out.

After some research, I thought that with my network set to use jumbo frames, that could be part of the problem. But when I set the network back to normal frame size, the problem actually got worse. The timeout is happening sooner in the backup, and another job that was successful has now failed the same way.

So, in summary, it looks like this is a limitation of rsync or BackupPC. I'm at a impasse until I get more info. Is there anything I can do to make this work? Is this a limitation of rsync or BackupPC that can't be worked around? I'm I even on the correct track?

I really would appreciate hearing from someone that knows this code.

Thanks,
John

On 01/16/2017 12:13 PM, John Spitzer wrote:
I have BackupPC setup with three host configurations to backup three 
separate areas of one system. Two of the hosts configurations are 
working. One repeatedly fails with the messages:

Can't write 33792 bytes to socket
Can't write 35328 bytes to socket
...
Got fatal error during xfer (aborted by signal=PIPE)

I would appreciate help determining why the backups are failing and how 
to keep it from failing again. I also would like help to recover the 
backup archive so that I don't have to delete the entire backup and 
start over.

I am using version 3.3.1, on Ubuntu 14.04. I am backing up over the 
network to a Netgear ReadyNAS 102. When the backup is working I can get 
up to 80 MB/s transfer rate to and from the NAS.

The first time this happened I deleted the entire backup for this host. 
It then works for a few backup cycles but then fails again. The back 
cycles were able to make a full, two incremental, a full, two 
incremental. Then it failed again with these same errors on a full backup.

When the backup failed the first time, I tried to find what file was 
causing the failure by analyzing logs and excluding files. But the file 
the that was being backed up at the time the backup failed changed.

I have set the logging level to 8 to try and capture more data. There is 
the portion of the log for the file that is failing and the previous file:
--------
attribSet: dir=fDrive1/.virtualbox/Machines/Win 10 Pro x64 VM07/Logs exists
attribSet(dir=fDrive1/.virtualbox/Machines/Win 10 Pro x64 VM07/Logs, 
file=VBox.log.1, size=113405, placeholder=)
Starting file 40 (.virtualbox/Machines/Win 10 Pro x64 
VM07/Logs/VBoxStartup.log), blkCnt=255, blkSize=2048, remainder=1713
.virtualbox/Machines/Win 10 Pro x64 VM07/Logs/VBoxStartup.log: blk=, 
newData=, rxMatchBlk=0, rxMatchNext=255
MD4 
/media/Backup/BackupPC/pc/johned-linux-vbox/5/fDrive1/f.virtualbox/fMachines/fWin 
10 Pro x64 VM07/fLogs/fVBoxStartup.log cache = , invalid =
.virtualbox/Machines/Win 10 Pro x64 VM07/Logs/VBoxStartup.log got exact 
match
.virtualbox/Machines/Win 10 Pro x64 VM07/Logs/VBoxStartup.log got 
digests 6b8d347f24fcac7a460fdfc49badc3a8 vs 6b8d347f24fcac7a460fdfc49badc3a8
.virtualbox/Machines/Win 10 Pro x64 VM07/Logs/VBoxStartup.log: nothing to do
   same     664 1000/1000      521905 .virtualbox/Machines/Win 10 Pro 
x64 VM07/Logs/VBoxStartup.log
attribSet: dir=fDrive1/.virtualbox/Machines/Win 10 Pro x64 VM07/Logs exists
attribSet(dir=fDrive1/.virtualbox/Machines/Win 10 Pro x64 VM07/Logs, 
file=VBoxStartup.log, size=521905, placeholder=)
Starting file 41 (.virtualbox/Machines/Win 10 Pro x64 VM07/Win 10 Pro 
x64 VM07-disk1.vdi), blkCnt=3068608, blkSize=16384, remainder=16384
/media/Backup/BackupPC/pc/johned-linux-vbox/5/fDrive1/f.virtualbox/fMachines/fWin 
7 Pro VM04/fSnapshots/f{e32c1bb2-0ca4-44cc-b858-7d17d92789a4}.vdi cache 
= , invalid = , phase = 0
Sending csums for .virtualbox/Machines/Win 7 Pro 
VM04/Snapshots/{e32c1bb2-0ca4-44cc-b858-7d17d92789a4}.vdi (size=18785239040)
.virtualbox/Machines/Win 10 Pro x64 VM07/Win 10 Pro x64 VM07-disk1.vdi: 
blk=, newData=, rxMatchBlk=0, rxMatchNext=3068608
MD4 
/media/Backup/BackupPC/pc/johned-linux-vbox/5/fDrive1/f.virtualbox/fMachines/fWin 
10 Pro x64 VM07/fWin 10 Pro x64 VM07-disk1.vdi cache = , invalid =
Can't write 33792 bytes to socket
Can't write 35328 bytes to socket
Sending csums, cnt = 138, phase = 1
/media/Backup/BackupPC/pc/johned-linux-vbox/5/fDrive1/f.virtualbox/fMachines/fWin 
7 Pro VM04/fWin 7 Pro VM04.vbox cache = , invalid = , phase = 1
Sending csums for .virtualbox/Machines/Win 7 Pro VM04/Win 7 Pro 
VM04.vbox (size=33258)
Done: 0 files, 0 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Not saving this as a partial backup since it has fewer files than the 
prior one (got 0 and 0 files versus 16)

Thank you for your help.

John S.

P.S. I'm new to posting to SourceForge, so forgive me if my format is 
not up to snuff.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/