BackupPC-users

[BackupPC-users] Backup failing over local lan.

2009-11-04 04:44:14
Subject: [BackupPC-users] Backup failing over local lan.
From: Bruford Harvie <brufordh AT synaq DOT com>
To: backuppc-users AT lists.sourceforge DOT net
Date: Wed, 4 Nov 2009 11:40:36 +0200 (SAST)
Hi Guys,

I have a strange problem with regards to failing backups running backuppc.

I'm backing up a samba fileshare on a linux box over a local lan to a linux box 
running backuppc.
The fileshare in total is fairly large ( 933Gb in size)
The backups are run using rsync.

I used to backup the entire share directory as one , eg /share ( which totals 
933Gb )
This didn't work as I would receive errors.

I then split the /share directory into smaller "chunks", and backed it up as 
individual hosts .

Eg /share/directory1 would be assigned as fileshare1
   /share/directory2 would be assigned as fileshare2

The "individual hosts" would then run on their own to be backed up.
I split the different sub-directories under the parent /share to be roughly as 
equal in size as possible without going down to deep into the directory 
structure,

Eg /share/directory1 = fileshare1
   /share/directory2 = fileshare2
   /share/directory3 and /share/directory4 = fileshare3

I have a total of 5 "fileshare hosts" which run on their own as individual 
hosts, thus backing up the different sub-directories under /share.

The sizes for the different fileshare's are as follows.

fileshare1 = 276Gb
fileshare2 = 84Gb
fileshare3 = 252Gb
fileshare4 = 338Gb
fileshare5 = less than a Gb.


The first backup run completed successfully, each "host" was backed up 
correctly.


But since, I've been experiencing backup failures on fileshare4 only ( it 
happens to be the largest chunk )

Fileshare4 consists of the following two directories :

/share/data/Samba-info        : 95Mb
/share/data/Shared_Services   : 338Gb

I have increased the ClientTimeout to = 360000 (4 days)

As I understand it aborted by signal=PIPE could be related to either filesystem 
errors / network errors / memory issues.

No filesystem errors in the logs at all.

Both these machines have 2Gb of memory each, the fileshare machine being backed 
up has 2 Intel Xeon processors @2.66Ghz and the backup machine has
Intel Core 2 duo CPU @ 2.33ghz

With the backup running, it's not using swap, so the hardware side seems fine.

I initially suspected network error's, after some searching it seems that there 
might be a issue with the e1000 card and 
TSO enabled on the card - I disabled TSO on both NIC's of both machine's, but 
still the same errors.
And only on the largest fileshare.

The backup would usually run for about 7.5 hours.
The following is a Xfer error summary for fileshare4

 -------------------------------------------------------
full backup started for directory /share/data/Samba-Info (baseline backup #20)
Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender 
--numeric-ids --perms --owner --group -D --links --hard-links --times 
--block-size=2048 --recursive --ignore-times . /share/data/Samba-Info/
Xfer PIDs are now 2590
Got remote protocol 30
Negotiated protocol version 28
Xfer PIDs are now 2590,2673
[ skipped 5 lines ]
Done: 4 files, 98952554 bytes
full backup started for directory /share/data/Shared_Services (baseline backup 
#20)
Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender 
--numeric-ids --perms --owner --group -D --links --hard-links --times 
--block-size=2048 --recursive --ignore-times . /share/data/Shared_Services/
Xfer PIDs are now 2817
Got remote protocol 30
Negotiated protocol version 28
Xfer PIDs are now 2817,2896
[ skipped 329963 lines ]
Can't write 33792 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
Child is aborting
Done: 304800 files, 350534594199 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Saving this as a partial backup
full backup started for directory /share/data/Samba-Info; updating partial #21
Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender 
--numeric-ids --perms --owner --group -D --links --hard-links --times 
--block-size=2048 --recursive --ignore-times . /share/data/Samba-Info/
Xfer PIDs are now 16009
Got remote protocol 30
Negotiated protocol version 28
Xfer PIDs are now 16009,16134
[ skipped 5 lines ]
Done: 4 files, 98952554 bytes
full backup started for directory /share/data/Shared_Services; updating partial 
#21
Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender 
--numeric-ids --perms --owner --group -D --links --hard-links --times 
--block-size=2048 --recursive --ignore-times . /share/data/Shared_Services/
Xfer PIDs are now 16139
Got remote protocol 30
Negotiated protocol version 28
Xfer PIDs are now 16139,16578
[ skipped 74785 lines ]
Remote[2]: file has vanished: 
"/share/data/Shared_Services/IT/(Public)/General/Software/Audit2009_09/~$ftwareAudit_Sep09c.doc"
[ skipped 255274 lines ]
Can't write 33792 bytes to socket
Read EOF: Connection reset by peer
Child is aborting
Done: 304885 files, 350634798658 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Saving this as a partial backup
 -------------------------------------------------------


Any ideas / suggestions?

Thanks.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>