Bacula-users

Re: [Bacula-users] Problem with Bacula 5.2.5 and Windows client 5.2.10.

2013-11-15 06:39:47
Subject: Re: [Bacula-users] Problem with Bacula 5.2.5 and Windows client 5.2.10.
From: Matias Banchoff <matiasb AT cespi.unlp.edu DOT ar>
To: Yann Cézard <yann.cezard AT univ-pau DOT fr>
Date: Fri, 15 Nov 2013 08:37:40 -0300
On 15/11/13 07:14, Yann Cézard wrote:
> Le 14/11/2013 23:48, Matias Banchoff a écrit :
>> Hi!
>>     We are having a problem between a Bacula server version 5.2.5 (SD and
>> Dir) and a Windows client running Bacula-fd 5.2.10.
>>
>>     The problem is we cannot make a full backup terminate with success.
>> Backup jobs start without any problem, but they end with error after
>> some GBs transfered (Sometimes after 1GBs, 2GBs, 4GBs. It does not
>> appear to be a network shapping problem).
>>     When the volume of data to backup is small, the backup job works
>> fine. The problem arises when we backup the whole data.
>>     Nobody has complained about connectivity problems with that Windows
>> Server (it's a database server).
>>
>>     We have tried the following, but none worked:
>> - enable and disable VSS.
>> - configure heartbeat interval... everywhere! (SD, dir, FD) :-)
>> - lower the tcp keepalive time in our Bacula server (The one with the SD
>> and Dir installed) (sysctl net.ipv4.tcp_keepalive_time).
>> - disable Checksum Offload in the NIC config (As someone suggested in
>> http://www.bacula.org/manuals/en/install/install/Client_Fi_daemon_Configura.html).
>> Our Windows has an HP nic.
>> - transfer data from the Windows FD to an SD in the same network the
>> Windows server is, in an attempt to put SD and FD "closer".
>>
>>     The next step is to downgrade Bacula-fd from 5.2.10 to 5.2.5 and see
>> if that resolves the problem. Any ideas? or any clue?
>>
>> Notes:
>> - Backups are stored in a storage mounted using NFS.
>> - The data we are backing up is a single 50GB file.
>>
>> Thanks!!!!
>>
> Hi Matias,
>
> You didn't tell what the error message was...
>
> But anyway, this problem reminds me of the one I had some years ago.
> I had a Windows client that suddenly always failed on Full backups, most
> of the time around 4GB transferred, no problem at all with Incremental,
> no problems with Linux Clients which were in the same network,
> and no other applications were having network problem on this Windows
> server...
> So I tried upgrading both side, tuning the Windows Network parameters
> (like checksum offload), played with heartbeat and keepalives... no success.
>
> I finally tested with a simple FTP transfert between the Windows client
> and my Linux SD => it failed at approximately 4GB to.
> So I ended dumping the network conversation at both side and then
> compare the dump in wireshark to find out some that at some time, the
> packet were not received, and were retransmited until it really fails.
> With those elements I asked the network team to analyse all the network
> devices involved between FD and SD, and they find out that a switch
> interface had a lot of errors.
> After changing it : problem solved !
>
> The fact was that only the Windows Client had the problem, and only
> Bacula would triggered it out, my guess is that the Linux TCP/IP pile
> was a little more robust than the Windows one.
> And it always happened in Full mode because the network is generally
> more used on Full backups.
>
> /mysysadminlife
>
> So my advice there :
> 1) test the network connection between the two hosts, with a high rate
> tranfert protocol like FTP (or iperf ?).
>     I don't know how iperf handles connection losts, but FTP clients will
> show you an error and will try to reconnect, you can't miss it if it
> happens.
> 2) analyse all the network elements involved in the route between your 2
> hosts.
>
> Just my 2 cents !
>

Hi!
   sorry for not telling what the error is. I send it below.
   And thanks for the response. I'll test the link with a big file 
transfer and analyse it.

Bye and thanks!

14-Nov 23:20 bacula-dir JobId 781: Start Backup JobId 781, 
Job=backup_server.2013-11-14_23.00.00_30
14-Nov 23:20 bacula-dir JobId 781: There are no more Jobs associated 
with Volume "Server-Inc-0191". Marking it purged.
14-Nov 23:20 bacula-dir JobId 781: All records pruned from Volume 
"Server-Inc-0191"; marking it "Purged"
14-Nov 23:20 bacula-dir JobId 781: Recycled volume "Server-Inc-0191"
14-Nov 23:20 bacula-dir JobId 781: Using Device "Server-dev"
14-Nov 23:20 bacula-sd JobId 781: Recycled volume "Server-Inc-0191" on 
device "Server-dev" (/backups), all previous data lost.
14-Nov 23:20 bacula-dir JobId 781: Max Volume jobs=1 exceeded. Marking 
Volume "Server-Inc-0191" as Used.
15-Nov 01:31 bacula-sd JobId 781: Fatal error: append.c:245 Network 
error reading from FD. ERR=Connection reset by peer
15-Nov 01:31 bacula-sd JobId 781: Job write elapsed time = 02:10:55, 
Transfer rate = 975.6 K Bytes/second
15-Nov 01:32 bacula-dir JobId 781: Fatal error: Network error with FD 
during Backup: ERR=Connection reset by peer
15-Nov 01:32 bacula-dir JobId 781: Fatal error: No Job status returned 
from FD.
15-Nov 01:32 bacula-dir JobId 781: Error: Bacula bacula-dir 5.2.5 (26Jan12):
   Build OS:               i686-pc-linux-gnu ubuntu 12.04
   JobId:                  781
   Job:                    backup_server.2013-11-14_23.00.00_30
   Backup Level:           Full (upgraded from Incremental)
   Client:                 "server-fd" 5.2.10 (28Jun12) Microsoft 
Windows Home ServerEnterprise Edition Service Pack 2 (build 
3790),Cross-compile,Win32
   FileSet:                "server-fs" 2013-11-11 18:23:28
   Pool:                   "server-inc" (From Run pool override)
   Catalog:                "MyCatalog" (From Client resource)
   Storage:                "Server-sg" (From run override)
   Scheduled time:         14-Nov-2013 23:00:00
   Start time:             14-Nov-2013 23:20:23
   End time:               15-Nov-2013 01:32:13
   Elapsed time:           2 hours 11 mins 50 secs
   Priority:               10
   FD Files Written:       0
   SD Files Written:       1
   FD Bytes Written:       0 (0 B)
   SD Bytes Written:       7,663,439,437 (7.663 GB)
   Rate:                   0.0 KB/s
   Software Compression:   None
   VSS:                    no
   Encryption:             no
   Accurate:               no
   Volume name(s):         Server-Inc-0191
   Volume Session Id:      13
   Volume Session Time:    1384460636
   Last Volume Bytes:      7,674,927,696 (7.674 GB)
   Non-fatal FD errors:    1
   SD Errors:              1
   FD termination status:  Error
   SD termination status:  Error
   Termination:            *** Backup Error ***



-----
CeSPI 
Centro Superior para el Procesamiento de la Información

Universidad Nacional de La Plata
-------------------------------------------------------------------------------
Proteja el Medioambiente. No imprima este mail si no es absolutamente necesario

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users