Bacula-users

Re: [Bacula-users] Backup of Vista is not very robust...

2010-05-24 12:26:19
Subject: Re: [Bacula-users] Backup of Vista is not very robust...
From: Josh Fisher <jfisher AT pvct DOT com>
To: marc AT marcchamberlin DOT com
Date: Mon, 24 May 2010 12:24:05 -0400

On 5/23/2010 1:58 PM, Marc Chamberlin wrote:
I am having a lot of difficulty getting a full backup of Vista systems, most of the time it fails with some kind of network error, usually after several gigabytes/hours have elapsed. I do not get this when doing a full backup of Linux systems (even from the same system when it is dual booted between Linux and Vista, so not likely a hardware failure). It is frustrating because doing a full backup of most systems takes well over a day to complete (several gigabytes...) and can result in extra delays of getting good backups of other systems. In all cases of a failure with Vista, I see the following sort of error message -

23-May 09:43 stephslaptop-fd JobId 160: Fatal error: /tmp/bacula/bacula/src/filed/backup.c:1019 Network send error to SD. ERR=Input/output error

I have seen this a few times on XP and Vista clients. In all cases it was a NIC driver "bug", or at least I consider it a bug.  What I think happens is that the NIC driver's power management implementation gets a little too green and cuts power to the physical interface, even when the Bacula FD still has a TCP socket open and connected. This causes the connection to be dropped. The Bacula FD's still open socket is then no longer really connected and a read or write causes an i/o error, which in turn fails the job. To the Bacula FD, it looks like the ethernet cable has been cut.

I think we see this happen when the FD is compressing a large file or otherwise doing something that causes packet transmissions to be delayed long enough to trigger the NIC driver's power management. So it is just more likely to happen when doing a full backup than when doing an incremental.


Is this a known issue with Vista? Is there a workaround? Does the Bacula File Daemon actually try several times to send the data across the network before giving up, or does it simply fail on the first try? In other words, is the File Daemon very robust? If this is an issue, couldn't a Resume backup of some kind be implemented so that we don't lose all the time starting over when the next backup is scheduled to run?

Many are using Bacula with Vista machines, so it isn't likely a problem with either Vista or Bacula. My guess is the Windows NIC driver. If possible, try a different model NIC in the Windows client. If the problem goes away, then that is the problem. Perhaps there is an updated driver for the NIC that doesn't work or perhaps the NIC driver allows turning off the power management features.

Also, some switches can apparently cause the same issues in much the same way.

If I kept Outlook running on the Windows laptop with it checking for new mail every 20 seconds, then full backups would run without issue. Anything to keep the NIC from going to sleep would work. For that Thinkpad, a connected socket was not enough. There had to be actual packets transmitted, which is why I consider it a NIC driver bug.


   Marc...

Here is the output reported via the emailed results, in case it is useful to see this in the context of how the backup ran....

> 22-May 15:41 stephslaptop-fd JobId 160: Generate VSS snapshots. Driver="VSS Vista", Drive(s)="C"
> 22-May 15:44 stephslaptop-fd JobId 160:      C:/Documents and Settings is a different filesystem. Will not descend from C:/ into C:/Documents and Settings
> 22-May 20:36 stephslaptop-fd JobId 160:      C:/ProgramData/Application Data is a different filesystem. Will not descend from C:/ into C:/ProgramData/Application Data
> 22-May 20:36 stephslaptop-fd JobId 160:      C:/ProgramData/Desktop is a different filesystem. Will not descend from C:/ into C:/ProgramData/Desktop
> 22-May 20:36 stephslaptop-fd JobId 160:      C:/ProgramData/Documents is a different filesystem. Will not descend from C:/ into C:/ProgramData/Documents
> 22-May 20:36 stephslaptop-fd JobId 160:      C:/ProgramData/Favorites is a different filesystem. Will not descend from C:/ into C:/ProgramData/Favorites
> 22-May 20:40 stephslaptop-fd JobId 160:      C:/ProgramData/Start Menu is a different filesystem. Will not descend from C:/ into C:/ProgramData/Start Menu
> 22-May 20:40 stephslaptop-fd JobId 160:      C:/ProgramData/Templates is a different filesystem. Will not descend from C:/ into C:/ProgramData/Templates
> 23-May 09:43 stephslaptop-fd JobId 160: Fatal error: /tmp/bacula/bacula/src/filed/backup.c:1019 Network send error to SD. ERR=Input/output error
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "MSSearch Service Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "ASR Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "BITS Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "Registry Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "Shadow Copy Optimization Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 stephslaptop-fd JobId 160: VSS Writer (BackupComplete): "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE)
> 23-May 09:46 bigbang-sd JobId 160: JobId=160 Job="stephslaptop.2010-05-22_15.35.55_03" marked to be canceled.
> 23-May 09:46 bigbang-dir JobId 160: Error: Bacula bigbang-dir 5.0.1 (24Feb10): 23-May-2010 09:46:06
>   Build OS:               x86_64-unknown-linux-gnu suse 11.2
>   JobId:                  160
>   Job:                    stephslaptop.2010-05-22_15.35.55_03
>   Backup Level:           Full (upgraded from Incremental)
>   Client:                 "stephslaptop-fd" 5.0.2 (28Apr10) Linux,Cross-compile,Win32
>   FileSet:                "Stephslaptop Set" 2010-05-21 15:05:47
>   Pool:                   "StephsLaptopPool" (From Job resource)
>   Catalog:                "MyCatalog" (From Client resource)
>   Storage:                "File" (From Job resource)
>   Scheduled time:         22-May-2010 15:35:53
>   Start time:             22-May-2010 15:35:58
>   End time:               23-May-2010 09:46:06
>   Elapsed time:           18 hours 10 mins 8 secs
>   Priority:               10
>   FD Files Written:       80,038
>   SD Files Written:       0
>   FD Bytes Written:       26,375,717,733 (26.37 GB)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   403.2 KB/s
>   Software Compression:   None
>   VSS:                    yes
>   Encryption:             no
>   Accurate:               no
>   Volume name(s):         StephsLaptopVol0018
>   Volume Session Id:      1
>   Volume Session Time:    1274567735
>   Last Volume Bytes:      25,998,335,932 (25.99 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  Error
>   SD termination status:  Error
>   Termination:            *** Backup Error ***
>
>

------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users AT lists.sourceforge DOT net https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>