[Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon
2011-05-20 04:45:59
Hi everyone,
Since a few weeks, I am facing a really strange problem with
my win32 bacula-fd.
It seems that the problem started when I upgraded my SD + DIR
to the 5.0.X (I was still using the 2.4.4 until that time).
The problem is that almost (this is only observable on Full backups
of several GB) all my win32-fd Full backups now fails with the
following errors :
- with a 2.4.4 client :
12-May 22:22 msadpau-fd JobId 2538: Generate VSS snapshots. Driver="VSS Win 2003", Drive(s)="E"
13-May 00:00 msadpau-fd JobId 2538: Fatal error: ../../filed/backup.c:892 Network send error to SD. ERR=Input/output error
13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
[...]
13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "NTDS", State: 0x1 (VSS_WS_STABLE)
13-May 00:02 backuppa-sd JobId 2538: JobId=2538 Job="msad-stockage-pau.2011-05-12_22.00.00_37" marked to be canceled.
13-May 00:02 backuppa-sd JobId 2538: Job write elapsed time = 01:39:44, Transfer rate = 6.296 M Bytes/second
13-May 00:02 backuppa-sd JobId 2538: Error: bsock.c:518 Read error from client:10.1.2.17:36643: ERR=Connection reset by peer
13-May 00:02 backuppa-dir JobId 2538: Error: Bacula backuppa-dir 5.0.2 (28Apr10): 13-May-2011 00:02:15
Build OS: x86_64-pc-linux-gnu debian squeeze/sid
Backup Level: Full
Client: "BLABLABLA" 2.4.4 (28Dec08) Linux,Cross-compile,Win32
- after upgrading the client to 5.0.3 (error message is more
verbose, but
the problem is still there) :
16-May 17:22 msadpau-fd JobId 2552: Generate VSS snapshots. Driver="VSS Win 2003", Drive(s)="E"
16-May 20:14 msadpau-fd JobId 2552: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65562 bytes to Storage daemon:backuppa:9103: ERR=Input/output error
16-May 20:14 msadpau-fd JobId 2552: Fatal error: /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. ERR=Input/output error
16-May 20:15 msadpau-fd JobId 2552: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to Storage daemon:backuppa:9103
16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
[...]
16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE)
16-May 20:16 backuppa-sd JobId 2552: JobId=2552 Job="msad-stockage-pau.2011-05-16_17.22.21_58" marked to be canceled.
16-May 20:16 backuppa-sd JobId 2552: Job write elapsed time = 02:53:20, Transfer rate = 6.234 M Bytes/second
16-May 20:16 backuppa-sd JobId 2552: Error: bsock.c:518 Read error from client:10.1.2.17:36643: ERR=Connection reset by peer
16-May 20:16 backuppa-dir JobId 2552: Error: Bacula backuppa-dir 5.0.2 (28Apr10): 16-May-2011 20:16:02
Build OS: x86_64-pc-linux-gnu debian squeeze/sid
Backup Level: Full
Client: "BLABLABLA" 5.0.3 (04Aug10) Linux,Cross-compile,Win32
- after upgrading the DIR/SD from 5.0.2 to 5.0.3 (Debian squeeze
=> wheezy) :
19-mai 10:07 msadpau-fd JobId 2565: Generate VSS snapshots. Driver="VSS Win 2003", Drive(s)="E"
19-mai 10:09 msadpau-fd JobId 2565: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65536 bytes to Storage daemon:backuppa:9103: ERR=Input/output error
19-mai 10:09 msadpau-fd JobId 2565: Fatal error: /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. ERR=Input/output error
19-mai 10:11 msadpau-fd JobId 2565: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to Storage daemon:backuppa:9103
19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE)
[...]
19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE)
19-mai 10:11 backuppa-sd JobId 2565: JobId=2565 Job="msad-stockage-pau.2011-05-19_10.03.34_03" marked to be canceled.
19-mai 10:11 backuppa-sd JobId 2565: Error: bsock.c:537 Read error from client:10.1.2.17:36643: ERR=Connexion ré-initialisée par le correspondant
19-mai 10:11 backuppa-sd JobId 2565: Job write elapsed time = 00:04:12, Transfer rate = 11.61 M Bytes/second
19-mai 10:11 backuppa-dir JobId 2565: Error: Bacula backuppa-dir 5.0.3 (04Aug10): 19-mai-2011 10:11:35
Build OS: x86_64-pc-linux-gnu debian wheezy/sid
Backup Level: Full
Client: "Serveur MSAD Pau" 5.0.3 (04Aug10) Linux,Cross-compile,Win32
After doing a lot of search through the list, and "googling", I
found differents
testimonies of similar problems, but no real solution.
What I tried :
- turn off the anti-virus software : no change
- Heartbeat Level in Storage and Client : no change
- reducing keepalive on SD : no change
- changing the Network Buffer Size : no change :
19-mai 17:29 msadpau-fd JobId 2577: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 32768 bytes to Storage daemon:backuppa:9103: ERR=Input/output error
19-mai 17:36 msadpau-fd JobId 2579: Error: /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 131072 bytes to Storage daemon:backuppa:9103: ERR=Input/output error
- I also turn on debuging/tracing on FD and SD size, didn't help so
much.
Here is the FD side :
msadpau-fd:
/home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send data to
SD len=65536
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0
Send data to SD len=65536
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0
Send data to SD len=65536
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0
Send data to SD len=65536
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0
Send data to SD len=65536
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
wait_intr=0 stop=0
/// Here, the SD side is showing that it's doing a
read and both side seems to wait for about 2 minutes
msadpau-fd:
/home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0 wait_intr=0
stop=0
/// and then the connexion is closed.
msadpau-fd:
/home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0 wait_intr=0
stop=0
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:91-0
Got BNET_SIG 0 from SD
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
wait_intr=1 stop=1
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:211-0
end blast_data ok=0
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:1660-0
Error in blast_data.
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:282-0 Quit
command loop. Canceled=1
msadpau-fd: /home/kern/bacula/k/bacula/src/lib/runscript.c:110-0
runscript: running all RUNSCRIPT object (ClientAfterJob)
JobStatus=f
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:309-0 End
FD msg: 2800 End Job TermCode=102 JobFiles=21762
ReadBytes=2921610796 JobBytes=2921545260 Errors=2 VSS=1 Encrypt=0
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:388-0
Calling term_find_files
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:391-0 Done
with term_find_files
msadpau-fd:
/home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:210-0 Enter
wchar_win32_path
msadpau-fd:
/home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:394-0 Leave
wchar_win32_path=\
msadpau-fd: /home/kern/bacula/k/bacula/src/lib/jcr.c:181-0
write_last_jobs seek to 192
msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0
garbage collect memory pool
msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:393-0 Done
with free_jcr
msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0
garbage collect memory pool
But now the strangiest thing is that, in order to have the job fails
faster
during my tests, I turn off the compression feature. And the result
is that the
job failed "a lot" faster, but not due to compression. In fact, with
compression,
sometime the job failed at 40GB written, sometime at 100GB. Without
compression,
it failed at 2GB, always at the exactly same number of bytes
written.
So i decided to make a test in the other way, and set the
compression up
(was GZIP2 at start, set it at GZIP4) : the job ran well !
So now I am wondering what can I do to found where the problem is
exactly ?
Is it on FD side ? on SD side ? It seems to happen when the data is
send too fast,
or too much load on the client side ?
Other facts :
- problem is observed on different kind of hardware
- problem is observed with clients on the same network than the
the backup server, as well as with client on another network.
- no problem at all on linux client with bigger sized jobs.
- there is no heavy load on the DIR/SD/MySQL side
- the different servers which have the problem doesn't show
any problem of network connectivity.
Any clue ? anything that I can do to try to trace the problem ?
Regards
--
Yann Cézard - infrastructures - administrateur systèmes serveurs
Centre de ressources informatiques - http://cri.univ-pau.fr
Université de Pau et des pays de l'Adour - http://www.univ-pau.fr
|
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay _______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|
<Prev in Thread] |
Current Thread |
[Next in Thread> |
- [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon,
Yann Cézard <=
|
|
|