BackupPC-users

Re: [BackupPC-users] aborted by signal=PIPE errors.

2008-07-16 01:36:24
Subject: Re: [BackupPC-users] aborted by signal=PIPE errors.
From: Peter Nankivell <nanks AT bigpond.net DOT au>
To: backuppc-users AT lists.sourceforge DOT net
Date: Wed, 16 Jul 2008 15:35:46 +1000
Hello,

I think I have fixed my problem.

TCP segmentation offload was turned on for the clients network card.

I'm running kubuntu 8.04 on both client and server.  The server was
loaded with kubuntu onto a blank disk, the client was upgraded from
7.10 to 8.04.

The both are running intel motherboards with integrated ethernet controllers.
Clients is a 8254EM, servers is a 8254EI.

I've been testing by rsyncing / on the client to a blank directory (folder)
on the server using the command

    rsync -av --one-file-system client:/ .

where "client" is the hostname of the client.  This is outside of "backuppc".
The problem was not with "backuppc".

Every time I ran the rsync command, it would eventually fail with a
message from "ssh"

    corrupted MAC on input

It had become apparent to me that the TCP packets were being corrupted at some
stage.  People had reported that cables and routers could be the problem,
but if that was so the the CRC protection of the packet data
was failing - unlikely.

Memory, or something else connected directly to the internal bus's that had
access to the data after the packets are unpacked had to be the culprit.

28 passes of all the memory tests overnight produced no errors.
Prompted by the experience of others, I checked the ethernet card
settings on both machines using

    # ethtool -k eth0
    Offload parameters for eth0:
    rx-checksumming: on
    tx-checksumming: on
    scatter-gather: on
    tcp segmentation offload: off
    udp fragmentation offload: off
    generic segmentation offload: off

For the client "tcp segmentation offload" was "on".  Turning it off
using

    # ethtool -K eth0 tso off

worked!  The test "rsync" now works wonderfully.

Peter.

On Tuesday 15 July 2008 15:01:03 Peter Nankivell wrote:
> Adam,
>
> Yes.  It is conceivable that something on the motherboard could be
> causing a TCP problem if the corruption occurs after the TCP packets
> are unpacked.  I'll check the memory on both the client and server tonight.
> Although I suspect the client as the server backs up other machines OK.
>
> Thanks, Peter.
>
> On Tuesday 15 July 2008 13:17:16 Adam Goryachev wrote:
> > Holger Parplies wrote:
> > > Hi,
> > >
> > > Peter Nankivell wrote on 2008-07-15 09:17:48 +1000 [Re:
> > > [BackupPC-users]
>
> aborted by signal=PIPE errors.]:
> > >> [...]
> > >> If it is all due to ssh and hardware it makes me worry about
> > >> the resilience of ssh.  Surely its protocol should be able to
> > >> retry on corrupted data?  I don't know...
> > >
> > > actually, TCP should provide ssh with a reliable data stream. If ssh
> > > *sees* corrupted data, then either
> > >
> > >   1) the data has been tampered with or
> > >   2) TCPs CRC failed to identify network data corruption.
> > >
> > > While (2) is conceivable, it should *not* be happening on a regular
> > > basis (if it is, something else is wrong). I would guess it to be so
> > > rare that ssh ignores the possibility, since there is no way to
> > > distinguish between both cases. In the case of tampering,
> > > retransmissions would not make much sense ("Hey, I noticed that! Be
> > > more subtle next time." :-).
> > >
> > >> Another solution maybe to use "rsyncd".  This should avoid
> > >> using ssh as the transport.  I haven't tried this yet.
> > >
> > > Providing you really have a corrupted TCP data stream, that would mean
> > > corrupted backups ... or an rsync protocol failure.
> >
> > I recently experienced a very long battle with this exact problem.
> >
> > A brand new server was being installed using an NFS root which was
> > mounted via TCP NFS. I was seeing random crashes usually related to NFS
> > disk errors and similar. After using wireshark on the NFS server to
> > 'watch' the traffic, I noticed a number of corrupted TCP (NFS) packets
> > arriving. I concluded it was a network issue, and proceeded to replace
> > the network card on the NFS client, the cable, and the switch. I was
> > still seeing the same errors, so I finally gave up and ran a memtest,
> > which found faulty memory. I replaced the memory and it has been stable
> > ever since.
> >
> > Just a suggestion, because again, TCP was not resolving the corrupt
> > packet issue, the packets were getting through to the NFS layer, causing
> > other issues since NFS doesn't seem to handle corrupt packets very well.
> >
> > Regards,
> > Adam
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge Build the coolest Linux based applications with Moblin SDK & win
> great prizes Grand prize is a trip for two to an Open Source event anywhere
> in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users AT lists.sourceforge DOT net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/