Bacula-users

[Bacula-users] Network error when running a Full backup

2008-11-04 12:47:01
Subject: [Bacula-users] Network error when running a Full backup
From: Matias Banchoff <matiasb AT cespi.unlp.edu DOT ar>
To: Bacula-users AT lists.sourceforge DOT net
Date: Tue, 04 Nov 2008 15:44:06 -0200
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,
    I'm having a problem with a windows 2003 server since some weeks.
The problem appears to happend randomly when I run Full backups (Some
full backup jobs end successfuly, while others don't). This is an
example of the error:

-
------------------------------------------------------------------------------------------------------------------------------------------
[..........]
02-Nov 00:19 server2003-fd: Generate VSS snapshots. Driver="VSS Win
2003", Drive(s)="E"
02-Nov 02:15 silicio-dir JobId 13134: Fatal error: Network error with
FD during Backup: ERR=Connection reset by peer
02-Nov 02:15 silicio-sd JobId 13134: Job
backup_server2003.2008-11-02_00.15.53 marked to be canceled.
02-Nov 02:15 silicio-sd JobId 13134: Fatal error: append.c:259 Network
error on data channel. ERR=Connection reset by peer
02-Nov 02:15 silicio-sd JobId 13134: Job write elapsed time =
01:58:36, Transfer rate = 438.0 K bytes/second
02-Nov 02:15 silicio-sd JobId 13134: Error: bsock.c:444 Read error
from client:xxxxxxxxxxxxxx:36643: ERR=Connection reset by peer
02-Nov 02:15 silicio-dir JobId 13134: Fatal error: No Job status
returned from FD.
02-Nov 02:15 silicio-dir JobId 13134: Error: Bacula silicio-dir 2.4.2
(26Jul08): 02-Nov-2008 02:15:03
[..........]
-
------------------------------------------------------------------------------------------------------------------------------------------

I'm doing backups of this machine since a year and a half, and just
recently it started with this error. The data volume has kept somehow
constant, between 40 and 50 GB during this period.
- - The ethernet cannot be because it's been always the same, we haven't
changed it.
- - The client did not have the Heartbit interval option set, so I set
it; but the problem persists.
- - There have been no changes in the firewall (apart, there is another
windows machine behind the firewall that uses bacula too, an it
doesn't have any problem).
- - The amount of data transfered cannot be the problem, because until
some time (when the full jobs ended right :-) ), the data transfered
was between 40 and 50 GB.
- - Might it be a problem between the bacula versions? Do you suggest
any probe that I could run?

My setup is the following:

Windows 2003 FD <----> Linux machine doing NAT  <-----> Linux machine
running Bacula SD and Director

Versions, according to the output of "status storage", "status
director" and "status client":
- - Bacula dir:  Version: 2.4.2 (26 July 2008)
- - Bacula sd:  Version: 2.2.8 (26 January 2008) i486-pc-linux-gnu
debian lenny/sid
- - Bacula fd version:
server2003-fd Version: 2.0.0 (04 January 2007)  VSS Linux
Cross-compile Win32
Daemon started 01-Oct-08 13:35, 70 Jobs run since started.
 Heap: bytes=101,724 max_bytes=295,310 bufs=100 max_bufs=230
 Sizeof: boffset_t=8 size_t=4 debug=0 trace=1

I set the Heartbeat Interval option for every hour: "Heartbeat
Interval = 3600"
So my fd config file looks like this:

-
---------------------------------------------------------------------------------------------------
FileDaemon {
  Name = server-fd
  FDport = 9102
  FDaddress = an_ip
  WorkingDirectory = /var/lib/bacula
  Pid Directory = /var/run/bacula
  Maximum Concurrent Jobs = 20

  Heartbeat Interval = 3600        # a cada hora
}
-
---------------------------------------------------------------------------------------------------

Here there are two failed backups, as examples (Note: xxxxxxxxxxxxxx
is the public IP of the firewall doing NAT):

-
----------------------------------------------------------------------------------------------------------------------------------------------------
02-Nov 00:15 silicio-dir JobId 13134: Start Backup JobId 13134,
Job=backup_server2003.2008-11-02_00.15.53
02-Nov 00:15 silicio-dir JobId 13134: There are no more Jobs
associated with Volume "Server2003-Full-0001". Marking it purged.
02-Nov 00:15 silicio-dir JobId 13134: All records pruned from Volume
"Server2003-Full-0001"; marking it "Purged"
02-Nov 00:15 silicio-dir JobId 13134: Recycled volume
"Server2003-Full-0001"
02-Nov 00:15 silicio-dir JobId 13134: Using Device "FileStorage"
02-Nov 00:16 silicio-sd JobId 13134: Recycled volume
"Server2003-Full-0001" on device "FileStorage"
(/var/cache/raid/backups/bacula), all previous data lost.
02-Nov 00:16 silicio-dir JobId 13134: Max Volume jobs exceeded.
Marking Volume "Server2003-Full-0001" as Used.
02-Nov 00:19 server2003-fd: Generate VSS snapshots. Driver="VSS Win
2003", Drive(s)="E"
02-Nov 02:15 silicio-dir JobId 13134: Fatal error: Network error with
FD during Backup: ERR=Connection reset by peer
02-Nov 02:15 silicio-sd JobId 13134: Job
backup_server2003.2008-11-02_00.15.53 marked to be canceled.
02-Nov 02:15 silicio-sd JobId 13134: Fatal error: append.c:259 Network
error on data channel. ERR=Connection reset by peer
02-Nov 02:15 silicio-sd JobId 13134: Job write elapsed time =
01:58:36, Transfer rate = 438.0 K bytes/second
02-Nov 02:15 silicio-sd JobId 13134: Error: bsock.c:444 Read error
from client:xxxxxxxxxxxxxx:36643: ERR=Connection reset by peer
02-Nov 02:15 silicio-dir JobId 13134: Fatal error: No Job status
returned from FD.
02-Nov 02:15 silicio-dir JobId 13134: Error: Bacula silicio-dir 2.4.2
(26Jul08): 02-Nov-2008 02:15:03
  Build OS:               i486-pc-linux-gnu debian lenny/sid
  JobId:                  13134
  Job:                    backup_server2003.2008-11-02_00.15.53
  Backup Level:           Full
  Client:                 "server2003-fd" 2.0.0 (04Jan07)
Linux,Cross-compile,Win32
  FileSet:                "server2003-fs" 2007-06-05 15:16:21
  Pool:                   "server2003-full" (From Run pool override)
  Storage:                "silicio-sd-disco" (From run override)
  Scheduled time:         02-Nov-2008 00:15:00
  Start time:             02-Nov-2008 00:15:03
  End time:               02-Nov-2008 02:15:03
  Elapsed time:           2 hours
  Priority:               10
  FD Files Written:       0
  SD Files Written:       9,398
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       3,117,219,483 (3.117 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         Server2003-Full-0001
  Volume Session Id:      711
  Volume Session Time:    1222865008
  Last Volume Bytes:      3,120,018,452 (3.120 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Canceled
  Termination:            *** Backup Error ***

-
----------------------------------------------------------------------------------------------------------------------------------------------------

[........]
02-Nov 22:10 server2003-fd: Generate VSS snapshots. Driver="VSS Win
2003", Drive(s)="E"
03-Nov 00:06 silicio-dir JobId 13166: Fatal error: Network error with
FD during Backup: ERR=Connection reset by peer
03-Nov 00:06 silicio-sd JobId 13166: Job
backup_server2003_HISTORICO.2008-11-02_22.06.33 marked to be canceled.
03-Nov 00:06 silicio-sd JobId 13166: Fatal error: append.c:259 Network
error on data channel. ERR=Connection reset by peer
03-Nov 00:06 silicio-sd JobId 13166: Job write elapsed time =
01:59:16, Transfer rate = 3.795 M bytes/second
03-Nov 00:06 silicio-sd JobId 13166: Error: bsock.c:444 Read error
from client:xxxxxxxxxxxxxxx:36643: ERR=Connection reset by peer
03-Nov 00:06 silicio-dir JobId 13166: Fatal error: No Job status
returned from FD.
03-Nov 00:06 silicio-dir JobId 13166: Error: Bacula silicio-dir 2.4.2
(26Jul08): 03-Nov-2008 00:06:54
  Build OS:               i486-pc-linux-gnu debian lenny/sid
  JobId:                  13166
  Job:                    backup_server2003_HISTORICO.2008-11-02_22.06.33
  Backup Level:           Full
  Client:                 "server2003-fd" 2.0.0 (04Jan07)
Linux,Cross-compile,Win32
  FileSet:                "server2003-fs-historico" 2007-06-08 11:36:41
  Pool:                   "server2003-historico" (From Job resource)
  Storage:                "silicio-sd-disco" (From Job resource)
  Scheduled time:         02-Nov-2008 22:06:31
  Start time:             02-Nov-2008 22:06:53
  End time:               03-Nov-2008 00:06:54
  Elapsed time:           2 hours 1 sec
  Priority:               40
  FD Files Written:       0
  SD Files Written:       8,230
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       27,162,744,355 (27.16 GB)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Storage Encryption:     no
  Volume name(s):         Server2003-Historico-0001
  Volume Session Id:      736
  Volume Session Time:    1222865008
  Last Volume Bytes:      27,183,300,572 (27.18 GB)
  Non-fatal FD errors:    0
  SD Errors:              0
  FD termination status:  Error
  SD termination status:  Canceled
  Termination:            *** Backup Error ***

-
----------------------------------------------------------------------------------------------------------------------------------------------------


Thank you

Matias
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkkQieEACgkQlK18JQ6L0qJpcQCfdkrQu8luqHrlYvCHdpW0DKyu
XSEAoLGEKH+A5nFndEdG7+cNFsXGbFrp
=Sl/w
-----END PGP SIGNATURE-----


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>
  • [Bacula-users] Network error when running a Full backup, Matias Banchoff <=