I’ve been working on an issue now for several weeks and it’s got support stumped (for the time being.)
Master: NBU 6.5.6 on Windows 2003 SP2
Clients: NBU 6.5.6. on Linux
Backups were working fine for months, then we started getting the occasional error 233. Then one weekend we started getting boatloads of them. Some backups are successful on 2nd attempt, others fail all weekend long. The failures occur at random intervals. The details of the activity monitor show connection reset by peer. The error only occurs on full backups, not incrementals.
bpbkar on the master / media.
17:11:16.868 [14342] <4> bpbkar PrintFile: /boot/
17:11:16.868 [14342] <2> bpbkar SelectFile: INF - cwd = /boot
17:11:16.868 [14342] <2> bpbkar SelectFile: INF - path = HP-initrd-2.6.9-78.EL.img
17:11:51.857 [14342] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 104: Connection reset by peer
17:11:51.857 [14342] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed
17:11:51.857 [14342] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed
bpbkar log on client shows a similar error.
11:12:48.679 [26078] <16> bpbkar sighandler: ERR - bpbkar killed by SIGPIPE
11:12:48.679 [26078] <2> bpbkar sighandler: INF - ignoring additional SIGPIPE signals
11:12:48.679 [26078] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 40: network connection broken
11:12:48.679 [26078] <4> bpbkar Exit: INF - EXIT STATUS 40: network connection broken
11:12:48.679 [26078] <2> bpbkar Exit: INF - Close of stdout complete
11:12:48.679 [26078] <4> bpbkar Exit: INF - setenv FINISHED=0
We ran a network sniffer on the traffic between the master/media and a client and everything runs fine for while before the master sends a bunch of RSTs, killing the job. Support found a Symantec article TCP window scaling, but we’ve verified those settings and they seem fine.
Any ideas?
TIA,
-Jonathan