Bacula-users

Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer

2012-09-26 09:44:37
Subject: Re: [Bacula-users] Network error with FD during Backup: ERR=Connection reset by peer
From: Josh Fisher <jfisher AT pvct DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Wed, 26 Sep 2012 09:41:40 -0400
On 9/26/2012 7:45 AM, Michael Neuendorf wrote:
> Hello there,
>
> I have a problem while backing up two windows servers in two different 
> installations. The scenarios are almost equal:
>
> - Bacula-dir (v5.0.1) on Ubuntu 10.04.3 virtualized with VMware vSphere 5 
> Hypervisor
> - Bacula-sd (v5.0.1) on same server with file storage on a NAS, mounted via 
> iSCSI.
> - Windows Server with problem in installation 1: Windows 2003 32bit SP1 
> virtulized on different VMware vSphere 5 Hypervisor, Bacula-fd 5.2.3
> - Windows Server with problem in installation 2: Windows 2008R2 SP1 64bit 
> virtulized on same hypervisor, Bacula-fd 5.2.6
> - Many other Servers (virtualized and physical), Windos or Linux, without any 
> problems.
>
> In both installations occur the errors shown below two to three times a week 
> (not in all backups). I have three jobs per server and it is not always the 
> same job, but always the same server.
>
> What I have done:
> - Use a newer FD version
> - Just one concurrent job per server ("Maximum Concurrent Jobs = 1" on all 
> FDs)
> - Set "Heartbeat Interval = 60" on FD, SD and Dir
>
> I hope, someone has a clue for me or a hint, where to troubleshoot further.

The transfer rate is very low in the first server log. Perhaps there 
truly is a problem with a NIC in this hypervisor, or cable, switch, etc.

>
> Best regards
>
> Michael Neuendorf
>
>
> The logs of the two server:
> 2012-09-19 22:58:45   bacula-dir JobId 13962: Start Backup JobId 13962, 
> Job=nina_systemstate.2012-09-19_21.50.01_31
> 2012-09-19 22:58:46   bacula-dir JobId 13962: Using Device "FileStorageLocal"
> 2012-09-19 23:02:41   nina-fd JobId 13962: DIR and FD clocks differ by 233 
> seconds, FD automatically compensating.
> 2012-09-19 23:02:41   nina-fd JobId 13962: DIR and FD clocks differ by 233 
> seconds, FD automatically compensating.
> 2012-09-19 23:02:45   nina-fd JobId 13962: shell command: run 
> ClientRunBeforeJob "C:/backup/bacula/systemstate.cmd"
> 2012-09-19 23:02:45   nina-fd JobId 13962: shell command: run 
> ClientRunBeforeJob "C:/backup/bacula/systemstate.cmd"
> 2012-09-19 23:03:40   bacula-dir JobId 13962: Sending Accurate information.
> 2012-09-19 23:05:12   bacula-dir-sd JobId 13962: Job write elapsed time = 
> 00:01:21, Transfer rate = 2.517 M Bytes/second
> 2012-09-19 23:09:06   nina-fd JobId 13962: shell command: run ClientAfterJob 
> "C:/backup/bacula/systemstate.cmd cleanup"
> 2012-09-19 23:09:06   nina-fd JobId 13962: shell command: run ClientAfterJob 
> "C:/backup/bacula/systemstate.cmd cleanup"
> 2012-09-19 23:05:17   bacula-dir JobId 13962: Fatal error: Network error with 
> FD during Backup: ERR=Connection reset by peer
>
> 2012-09-19 23:05:18   bacula-dir-sd JobId 13962: JobId=13962 
> Job="nina_systemstate.2012-09-19_21.50.01_31" marked to be canceled.
> 2012-09-19 23:05:19   bacula-dir JobId 13962: Error: Bacula bacula-dir 5.0.1 
> (24Feb10): 19-Sep-2012 23:05:19
>    Build OS:               i486-pc-linux-gnu ubuntu 10.04
>    JobId:                  13962
>    Job:                    nina_systemstate.2012-09-19_21.50.01_31
>    Backup Level:           Incremental, since=2012-09-18 23:07:02
>    Client:                 "nina-fd" 5.2.3 (16Dec11) Microsoft Windows Home 
> ServerStandard Edition Service Pack 1 (build 3790),Cross-compile,Win32
>    FileSet:                "nina_systemstate-set" 2012-04-30 21:50:01
>    Pool:                   "DailyLocal" (From Job IncPool override)
>    Catalog:                "MyCatalog" (From Client resource)
>    Storage:                "Local" (From Pool resource)
>    Scheduled time:         19-Sep-2012 21:50:01
>    Start time:             19-Sep-2012 23:03:37
>    End time:               19-Sep-2012 23:05:19
>    Elapsed time:           1 min 42 secs
>    Priority:               10
>    FD Files Written:       4
>    SD Files Written:       0
>    FD Bytes Written:       203,957,393 (203.9 MB)
>    SD Bytes Written:       0 (0 B)
>    Rate:                   1999.6 KB/s
>    Software Compression:   59.0 %
>    VSS:                    no
>    Encryption:             no
>    Accurate:               yes
>    Volume name(s):         DailyLocal-0117
>    Volume Session Id:      674
>    Volume Session Time:    1346772020
>    Last Volume Bytes:      32,675,913,343 (32.67 GB)
>    Non-fatal FD errors:    0
>    SD Errors:              0
>    FD termination status:  Error
>    SD termination status:  Running
>    Termination:            *** Backup Error ***
>
> and
>
> 2012-09-24 23:50:02   bacula-dir JobId 4017: Start Sicherung JobId 4017, 
> Job=BuHaSrv1_datev.2012-09-24_23.50.00_33
> 2012-09-24 23:50:02   bacula-dir JobId 4017: There are no more Jobs 
> associated with Volume "DailyLocal-0005". Marking it purged.
> 2012-09-24 23:50:02   bacula-dir JobId 4017: All records pruned from Volume 
> "DailyLocal-0005"; marking it "Purged"
> 2012-09-24 23:50:02   bacula-dir JobId 4017: Recycled volume "DailyLocal-0005"
> 2012-09-24 23:50:02   bacula-dir JobId 4017: Using Device "FileStorageLocal"
> 2012-09-24 23:50:03   buhasrv1-fd JobId 4017: shell command: run 
> ClientRunBeforeJob "c:/backup/stopsql.cmd"
> 2012-09-24 23:50:03   buhasrv1-fd JobId 4017: ClientRunBeforeJob:
> 2012-09-24 23:50:03   buhasrv1-fd JobId 4017: ClientRunBeforeJob: 
> C:\Windows\system32>net stop MSSQL$DATEV_SV_SE01
> 2012-09-24 23:50:08   buhasrv1-fd JobId 4017: ClientRunBeforeJob: SQL Server 
> (DATEV_SV_SE01) wird beendet..
> 2012-09-24 23:50:08   buhasrv1-fd JobId 4017: ClientRunBeforeJob: SQL Server 
> (DATEV_SV_SE01) wurde erfolgreich beendet.
> 2012-09-24 23:50:08   buhasrv1-fd JobId 4017: ClientRunBeforeJob:
> 2012-09-24 23:50:08   bacula-dir JobId 4017: Sending Accurate information.
> 2012-09-24 23:50:10   bacula-sd JobId 4017: Recycled volume "DailyLocal-0005" 
> on device "FileStorageLocal" (/var/spool/bacula/local), all previous data 
> lost.
> 2012-09-24 23:50:11   buhasrv1-fd JobId 4017: Generate VSS snapshots. 
> Driver="Win64 VSS", Drive(s)="F"
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "Task Scheduler Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "VSS Metadata Store Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "Performance Counters Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "System Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "ASR Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "SqlServerWriter", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "WMI Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "TermServLicensing", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:21   buhasrv1-fd JobId 4017: VSS Writer (BackupComplete): 
> "Registry Writer", State: 0x1 (VSS_WS_STABLE)
> 2012-09-24 23:55:22   buhasrv1-fd JobId 4017: shell command: run 
> ClientAfterJob "c:/backup/startsql.cmd"
> 2012-09-24 23:55:22   bacula-sd JobId 4017: Job write elapsed time = 
> 00:05:12, Transfer rate = 41.31 M Bytes/second
> 2012-09-24 23:55:22   buhasrv1-fd JobId 4017: ClientAfterJob:
> 2012-09-24 23:55:22   buhasrv1-fd JobId 4017: ClientAfterJob: 
> C:\Windows\system32>net start MSSQL$DATEV_SV_SE01
> 2012-09-24 23:55:26   buhasrv1-fd JobId 4017: ClientAfterJob: SQL Server 
> (DATEV_SV_SE01) wird gestartet..
> 2012-09-24 23:55:26   buhasrv1-fd JobId 4017: ClientAfterJob: SQL Server 
> (DATEV_SV_SE01) wurde erfolgreich gestartet.
> 2012-09-24 23:55:26   buhasrv1-fd JobId 4017: ClientAfterJob:
> 2012-09-24 23:55:26   bacula-dir JobId 4017: Fatal error: Netzwerkfehler mit 
> FD bei Backup: ERR=Die Verbindung wurde vom Kommunikationspartner 
> zurückgesetzt
>
> 2012-09-24 23:55:26   bacula-dir JobId 4017: Error: Bacula bacula-dir 5.0.1 
> (24Feb10): 24-Sep-2012 23:55:26
>    Build OS:               i486-pc-linux-gnu ubuntu 10.04
>    JobId:                  4017
>    Job:                    BuHaSrv1_datev.2012-09-24_23.50.00_33
>    Backup Level:           Incremental, seit=2012-09-22 19:10:06
>    Client:                 "BuHaSrv1-fd" 5.2.6 (21Feb12) Microsoft Windows 
> Server 2008 R2 Standard Edition Service Pack 1 (build 7601), 
> 64-bit,Cross-compile,Win64
>    FileSet:                "BuHaSrv1_datev-set" 2011-11-30 21:50:00
>    Pool:                   "DailyLocal" (From Job IncPool override)
>    Catalog:                "MyCatalog" (From Client resource)
>    Storage:                "Local" (From Pool Ressource)
>    Scheduled time:         24-Sep-2012 23:50:00
>    Start time:             24-Sep-2012 23:50:08
>    End time:               24-Sep-2012 23:55:26
>    Elapsed time:           5 mins 18 secs
>    Priority:               5
>    FD Files Written:       141
>    SD Files Written:       141
>    FD Bytes Written:       12,888,814,964 (12.88 GB)
>    SD Bytes Written:       12,888,841,033 (12.88 GB)
>    Rate:                   40530.9 KB/s
>    Software Compression:   None
>    VSS:                    yes
>    Encryption:             no
>    Accurate:               yes
>    Volume name(s):         DailyLocal-0005
>    Volume Session Id:      274
>    Volume Session Time:    1346153192
>    Last Volume Bytes:      12,898,403,662 (12.89 GB)
>    Non-fatal FD errors:    0
>    SD Errors:              0
>    FD termination status:  Error
>    SD termination status:  OK
>    Termination:            *** Sicherungsfehler ***
>   
>
>   
> NovaNet GmbH
> Kupferstr. 65
> 44532 Lünen
> Telefon: 02306/202100
> FAX: 02306/202109
> WEB: www.novanetgmbh.de
> Firmensitz: Lünen
> Amtsgericht Dortmund HRB 17273
> USt-ID DE 124793480, St.-Nr. 316/5759/0318
> Geschäftsführerin: Dipl. Informatikerin (FH) Desiree Wünsche
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users