Bacula-users

[Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer

2012-09-26 08:06:49
Subject: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
From: "Michael Neuendorf" <michael.neuendorf AT novanetgmbh DOT de>
To: <bacula-users AT lists.sourceforge DOT net>
Date: Wed, 26 Sep 2012 13:45:47 +0200
Hello there,

I have a problem while backing up two windows servers in two different 
installations. The scenarios are almost equal:

- Bacula-dir (v5.0.1) on Ubuntu 10.04.3 virtualized with VMware vSphere 5 
Hypervisor
- Bacula-sd (v5.0.1) on same server with file storage on a NAS, mounted via 
iSCSI.
- Windows Server with problem in installation 1: Windows 2003 32bit SP1 
virtulized on different VMware vSphere 5 Hypervisor, Bacula-fd 5.2.3
- Windows Server with problem in installation 2: Windows 2008R2 SP1 64bit 
virtulized on same hypervisor, Bacula-fd 5.2.6
- Many other Servers (virtualized and physical), Windos or Linux, without any 
problems.

In both installations occur the errors shown below two to three times a week 
(not in all backups). I have three jobs per server and it is not always the 
same job, but always the same server.

What I have done:
- Use a newer FD version
- Just one concurrent job per server ("Maximum Concurrent Jobs = 1" on all FDs)
- Set "Heartbeat Interval = 60" on FD, SD and Dir

I hope, someone has a clue for me or a hint, where to troubleshoot further.

Best regards

Michael Neuendorf


The logs of the two server:
2012-09-19 22:58:45   bacula-dir JobId 13962: Start Backup JobId 13962, 
Job=nina_systemstate.2012-09-19_21.50.01_31
2012-09-19 22:58:46   bacula-dir JobId 13962: Using Device "FileStorageLocal"
2012-09-19 23:02:41   nina-fd JobId 13962: DIR and FD clocks differ by 233 
seconds, FD automatically compensating.
2012-09-19 23:02:41   nina-fd JobId 13962: DIR and FD clocks differ by 233 
seconds, FD automatically compensating.
2012-09-19 23:02:45   nina-fd JobId 13962: shell command: run 
ClientRunBeforeJob "C:/backup/bacula/systemstate.cmd"
2012-09-19 23:02:45   nina-fd JobId 13962: shell command: run 
ClientRunBeforeJob "C:/backup/bacula/systemstate.cmd"
2012-09-19 23:03:40   bacula-dir JobId 13962: Sending Accurate information.
2012-09-19 23:05:12   bacula-dir-sd JobId 13962: Job write elapsed time = 
00:01:21, Transfer rate = 2.517 M Bytes/second
2012-09-19 23:09:06   nina-fd JobId 13962: shell command: run ClientAfterJob 
"C:/backup/bacula/systemstate.cmd cleanup"
2012-09-19 23:09:06   nina-fd JobId 13962: shell command: run ClientAfterJob 
"C:/backup/bacula/systemstate.cmd cleanup"
2012-09-19 23:05:17   bacula-dir JobId 13962: Fatal error: Network error with 
FD during Backup: ERR=Connection reset by peer 

2012-09-19 23:05:18   bacula-dir-sd JobId 13962: JobId=13962 
Job="nina_systemstate.2012-09-19_21.50.01_31" marked to be canceled.
2012-09-19 23:05:19   bacula-dir JobId 13962: Error: Bacula bacula-dir 5.0.1 
(24Feb10): 19-Sep-2012 23:05:19
  Build OS:               i486-pc-linux-gnu ubuntu 10.04
  JobId:                  13962
  Job:                    nina_systemstate.2012-09-19_21.50.01_31
  Backup Level:           Incremental, since=2012-09-18 23:07:02
  Client:                 "nina-fd" 5.2.3 (16Dec11) Microsoft Windows Home 
ServerStandard Edition Service Pack 1 (build 3790),Cross-compile,Win32
  FileSet:                "nina_systemstate-set" 2012-04-30 21:50:01
  Pool:                   "DailyLocal" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Local" (From Pool resource)
  Scheduled time:         19-Sep-2012 21:50:01
  Start time:             19-Sep-2012 23:03:37
  End time:               19-Sep-2012 23:05:19
  Elapsed time:           1 min 42 secs
  Priority:               10
  FD Files Written:       4
  SD Files Written:       0
  FD Bytes Written:       203,957,393 (203.9 MB)
  SD Bytes Written:       0 (0 B)
  Rate:                   1999.6 KB/s
  Software Compression:   59.0 %
  VSS:                    no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         DailyLocal-0117
  Volume Session Id:      674
  Volume Session Time:    1346772020
  Last Volume Bytes:      32,675,913,343 (32.67 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Running
  Termination:            *** Backup Error ***

and

2012-09-24 23:50:02   bacula-dir JobId 4017: Start Sicherung JobId 4017, 
Job=BuHaSrv1_datev.2012-09-24_23.50.00_33
2012-09-24 23:50:02   bacula-dir JobId 4017: There are no more Jobs associated 
with Volume "DailyLocal-0005". Marking it purged.
2012-09-24 23:50:02   bacula-dir JobId 4017: All records pruned from Volume 
"DailyLocal-0005"; marking it "Purged"
2012-09-24 23:50:02   bacula-dir JobId 4017: Recycled volume "DailyLocal-0005"
2012-09-24 23:50:02   bacula-dir JobId 4017: Using Device "FileStorageLocal"
2012-09-24 23:50:03   buhasrv1-fd JobId 4017: shell command: run 
ClientRunBeforeJob "c:/backup/stopsql.cmd"
2012-09-24 23:50:03   buhasrv1-fd JobId 4017: ClientRunBeforeJob:
2012-09-24 23:50:03   buhasrv1-fd JobId 4017: ClientRunBeforeJob: 
C:\Windows\system32>net stop MSSQL$DATEV_SV_SE01
2012-09-24 23:50:08   buhasrv1-fd JobId 4017: ClientRunBeforeJob: SQL Server 
(DATEV_SV_SE01) wird beendet..
2012-09-24 23:50:08   buhasrv1-fd JobId 4017: ClientRunBeforeJob: SQL Server 
(DATEV_SV_SE01) wurde erfolgreich beendet.
2012-09-24 23:50:08   buhasrv1-fd JobId 4017: ClientRunBeforeJob:
2012-09-24 23:50:08   bacula-dir JobId 4017: Sending Accurate information.
2012-09-24 23:50:10   bacula-sd JobId 4017: Recycled volume "DailyLocal-0005" 
on device "FileStorageLocal" (/var/spool/bacula/local), all previous data lost.
2012-09-24 23:50:11   buhasrv1-fd JobId 4017: Generate VSS snapshots. 
Driver="Win64 VSS", Drive(s)="F"
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"Task Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): "VSS 
Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"System Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): "ASR 
Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"SqlServerWriter", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): "WMI 
Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"TermServLicensing", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
"Registry Writer", State: 0x1 (VSS_WS_STABLE)
2012-09-24 23:55:22   buhasrv1-fd JobId 4017: shell command: run ClientAfterJob 
"c:/backup/startsql.cmd"
2012-09-24 23:55:22   bacula-sd JobId 4017: Job write elapsed time = 00:05:12, 
Transfer rate = 41.31 M Bytes/second
2012-09-24 23:55:22   buhasrv1-fd JobId 4017: ClientAfterJob:
2012-09-24 23:55:22   buhasrv1-fd JobId 4017: ClientAfterJob: 
C:\Windows\system32>net start MSSQL$DATEV_SV_SE01
2012-09-24 23:55:26   buhasrv1-fd JobId 4017: ClientAfterJob: SQL Server 
(DATEV_SV_SE01) wird gestartet..
2012-09-24 23:55:26   buhasrv1-fd JobId 4017: ClientAfterJob: SQL Server 
(DATEV_SV_SE01) wurde erfolgreich gestartet.
2012-09-24 23:55:26   buhasrv1-fd JobId 4017: ClientAfterJob:
2012-09-24 23:55:26   bacula-dir JobId 4017: Fatal error: Netzwerkfehler mit FD 
bei Backup: ERR=Die Verbindung wurde vom Kommunikationspartner zurückgesetzt 

2012-09-24 23:55:26   bacula-dir JobId 4017: Error: Bacula bacula-dir 5.0.1 
(24Feb10): 24-Sep-2012 23:55:26
  Build OS:               i486-pc-linux-gnu ubuntu 10.04
  JobId:                  4017
  Job:                    BuHaSrv1_datev.2012-09-24_23.50.00_33
  Backup Level:           Incremental, seit=2012-09-22 19:10:06
  Client:                 "BuHaSrv1-fd" 5.2.6 (21Feb12) Microsoft Windows 
Server 2008 R2 Standard Edition Service Pack 1 (build 7601), 
64-bit,Cross-compile,Win64
  FileSet:                "BuHaSrv1_datev-set" 2011-11-30 21:50:00
  Pool:                   "DailyLocal" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "Local" (From Pool Ressource)
  Scheduled time:         24-Sep-2012 23:50:00
  Start time:             24-Sep-2012 23:50:08
  End time:               24-Sep-2012 23:55:26
  Elapsed time:           5 mins 18 secs
  Priority:               5
  FD Files Written:       141
  SD Files Written:       141
  FD Bytes Written:       12,888,814,964 (12.88 GB)
  SD Bytes Written:       12,888,841,033 (12.88 GB)
  Rate:                   40530.9 KB/s
  Software Compression:   None
  VSS:                    yes
  Encryption:             no
  Accurate:               yes
  Volume name(s):         DailyLocal-0005
  Volume Session Id:      274
  Volume Session Time:    1346153192
  Last Volume Bytes:      12,898,403,662 (12.89 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  OK
  Termination:            *** Sicherungsfehler ***
 

 
NovaNet GmbH
Kupferstr. 65
44532 Lünen
Telefon: 02306/202100
FAX: 02306/202109
WEB: www.novanetgmbh.de
Firmensitz: Lünen
Amtsgericht Dortmund HRB 17273
USt-ID DE 124793480, St.-Nr. 316/5759/0318
Geschäftsführerin: Dipl. Informatikerin (FH) Desiree Wünsche

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users