Bacula-users

Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon

2011-06-11 05:18:54
Subject: Re: [Bacula-users] Win32 FD / Write error sending N bytes to Storage daemon
From: Bruno Friedmann <bruno AT ioda-net DOT ch>
To: bacula-users AT lists.sourceforge DOT net
Date: Sat, 11 Jun 2011 11:15:30 +0200
On 05/20/2011 10:22 AM, Yann Cézard wrote:
> Hi everyone,
> 
> Since a few weeks, I am facing a really strange problem with
> my win32 bacula-fd.
> 
> It seems that the problem started when I upgraded my SD + DIR
> to the 5.0.X (I was still using the 2.4.4 until that time).
> The problem is that almost (this is only observable on Full backups
> of several GB) all my win32-fd Full backups now fails with the following
> errors :
> - with a 2.4.4 client :
> 
> 12-May 22:22 msadpau-fd JobId 2538: Generate VSS snapshots. Driver="VSS Win 
> 2003", Drive(s)="E"
> 13-May 00:00 msadpau-fd JobId 2538: Fatal error: ../../filed/backup.c:892 
> Network send error to SD. ERR=Input/output error
> 13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> [...]
> 13-May 00:02 msadpau-fd JobId 2538: VSS Writer (BackupComplete): "NTDS", 
> State: 0x1 (VSS_WS_STABLE)
> 13-May 00:02 backuppa-sd JobId 2538: JobId=2538 
> Job="msad-stockage-pau.2011-05-12_22.00.00_37" marked to be canceled.
> 13-May 00:02 backuppa-sd JobId 2538: Job write elapsed time = 01:39:44, 
> Transfer rate = 6.296 M Bytes/second
> 13-May 00:02 backuppa-sd JobId 2538: Error: bsock.c:518 Read error from 
> client:10.1.2.17:36643: ERR=Connection reset by peer
> 13-May 00:02 backuppa-dir JobId 2538: Error: Bacula backuppa-dir 5.0.2 
> (28Apr10): 13-May-2011 00:02:15
>   Build OS:               x86_64-pc-linux-gnu debian squeeze/sid
>   Backup Level:           Full
>   Client:                 "BLABLABLA" 2.4.4 (28Dec08) 
> Linux,Cross-compile,Win32
> 
> - after upgrading the client to 5.0.3 (error message is more verbose, but
> the problem is still there) :
> 
> 16-May 17:22 msadpau-fd JobId 2552: Generate VSS snapshots. Driver="VSS Win 
> 2003", Drive(s)="E"
> 16-May 20:14 msadpau-fd JobId 2552: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65562 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> 16-May 20:14 msadpau-fd JobId 2552: Fatal error: 
> /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. 
> ERR=Input/output error
> 16-May 20:15 msadpau-fd JobId 2552: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to 
> Storage daemon:backuppa:9103
> 16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> [...]
> 16-May 20:16 msadpau-fd JobId 2552: VSS Writer (BackupComplete): "WMI 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 16-May 20:16 backuppa-sd JobId 2552: JobId=2552 
> Job="msad-stockage-pau.2011-05-16_17.22.21_58" marked to be canceled.
> 16-May 20:16 backuppa-sd JobId 2552: Job write elapsed time = 02:53:20, 
> Transfer rate = 6.234 M Bytes/second
> 16-May 20:16 backuppa-sd JobId 2552: Error: bsock.c:518 Read error from 
> client:10.1.2.17:36643: ERR=Connection reset by peer
> 16-May 20:16 backuppa-dir JobId 2552: Error: Bacula backuppa-dir 5.0.2 
> (28Apr10): 16-May-2011 20:16:02
>   Build OS:               x86_64-pc-linux-gnu debian squeeze/sid
>   Backup Level:           Full
>   Client:                 "BLABLABLA" 5.0.3 (04Aug10) 
> Linux,Cross-compile,Win32
> 
> - after upgrading the DIR/SD from 5.0.2 to 5.0.3 (Debian squeeze =>
> wheezy) :
> 
> 19-mai 10:07 msadpau-fd JobId 2565: Generate VSS snapshots. Driver="VSS Win 
> 2003", Drive(s)="E"
> 19-mai 10:09 msadpau-fd JobId 2565: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 65536 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> 19-mai 10:09 msadpau-fd JobId 2565: Fatal error: 
> /home/kern/bacula/k/bacula/src/filed/backup.c:1024 Network send error to SD. 
> ERR=Input/output error
> 19-mai 10:11 msadpau-fd JobId 2565: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:339 Socket has errors=1 on call to 
> Storage daemon:backuppa:9103
> 19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "System 
> Writer", State: 0x1 (VSS_WS_STABLE)
> [...]
> 19-mai 10:11 msadpau-fd JobId 2565: VSS Writer (BackupComplete): "WMI 
> Writer", State: 0x1 (VSS_WS_STABLE)
> 19-mai 10:11 backuppa-sd JobId 2565: JobId=2565 
> Job="msad-stockage-pau.2011-05-19_10.03.34_03" marked to be canceled.
> 19-mai 10:11 backuppa-sd JobId 2565: Error: bsock.c:537 Read error from 
> client:10.1.2.17:36643: ERR=Connexion ré-initialisée par le correspondant
> 19-mai 10:11 backuppa-sd JobId 2565: Job write elapsed time = 00:04:12, 
> Transfer rate = 11.61 M Bytes/second
> 19-mai 10:11 backuppa-dir JobId 2565: Error: Bacula backuppa-dir 5.0.3 
> (04Aug10): 19-mai-2011 10:11:35
>   Build OS:               x86_64-pc-linux-gnu debian wheezy/sid
>   Backup Level:           Full
>   Client:                 "Serveur MSAD Pau" 5.0.3 (04Aug10) 
> Linux,Cross-compile,Win32
> 
> After doing a lot of search through the list, and "googling", I found
> differents
> testimonies of similar problems, but no real solution.
> 
> What I tried :
> - turn off the anti-virus software : no change
> - Heartbeat Level in Storage and Client : no change
> - reducing keepalive on SD : no change
> - changing the Network Buffer Size : no change :
> 
> 19-mai 17:29 msadpau-fd JobId 2577: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 32768 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> 
> 19-mai 17:36 msadpau-fd JobId 2579: Error: 
> /home/kern/bacula/k/bacula/src/lib/bsock.c:393 Write error sending 131072 
> bytes to Storage daemon:backuppa:9103: ERR=Input/output error
> 
> - I also turn on debuging/tracing on FD and SD size, didn't help so much.
>   Here is the FD side :
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:1028-0 Send
> data to SD len=65536
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=0 stop=0
> /// Here, the SD side is showing that it's doing a read and both side
> seems to wait for about 2 minutes
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=0 stop=0
> /// and then the connexion is closed.
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=0 stop=0
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:91-0 Got
> BNET_SIG 0 from SD
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/heartbeat.c:96-0
> wait_intr=1 stop=1
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/backup.c:211-0 end
> blast_data ok=0
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:1660-0 Error in
> blast_data.
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:282-0 Quit
> command loop. Canceled=1
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/runscript.c:110-0
> runscript: running all RUNSCRIPT object (ClientAfterJob) JobStatus=f
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:309-0 End FD msg:
> 2800 End Job TermCode=102 JobFiles=21762 ReadBytes=2921610796
> JobBytes=2921545260 Errors=2 VSS=1 Encrypt=0
> 
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:388-0 Calling
> term_find_files
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:391-0 Done with
> term_find_files
> msadpau-fd: /home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:210-0
> Enter wchar_win32_path
> msadpau-fd: /home/kern/bacula/k/bacula/src/win32/compat/compat.cpp:394-0
> Leave wchar_win32_path=\
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/jcr.c:181-0
> write_last_jobs seek to 192
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0 garbage
> collect memory pool
> msadpau-fd: /home/kern/bacula/k/bacula/src/filed/job.c:393-0 Done with
> free_jcr
> msadpau-fd: /home/kern/bacula/k/bacula/src/lib/mem_pool.c:369-0 garbage
> collect memory pool
> 
> 
> But now the strangiest thing is that, in order to have the job fails faster
> during my tests, I turn off the compression feature. And the result is
> that the
> job failed "a lot" faster, but not due to compression. In fact, with
> compression,
> sometime the job failed at 40GB written, sometime at 100GB. Without
> compression,
> it failed at 2GB, always at the exactly same number of bytes written.
> So i decided to make a test in the other way, and set the compression up
> (was GZIP2 at start, set it at GZIP4) : the job ran well !
> 
> So now I am wondering what can I do to found where the problem is exactly ?
> Is it on FD side ? on SD side ? It seems to happen when the data is send
> too fast,
> or too much load on the client side ?
> 
> Other facts :
> - problem is observed on different kind of hardware
> - problem is observed with clients on the same network than the
>   the backup server, as well as with client on another network.
> - no problem at all on linux client with bigger sized jobs.
> - there is no heavy load on the DIR/SD/MySQL side
> - the different servers which have the problem doesn't show
>   any problem of network connectivity.
> 
> Any clue ? anything that I can do to try to trace the problem ?
> 
> Regards
> 
> 

Why not using the 5.0.3 win32 client, which is the recommended way.
Compatibility between dir/sd/fd is not always preserved with mixed version.


-- 

Bruno Friedmann
Ioda-Net Sàrl www.ioda-net.ch

openSUSE Member & Ambassador
GPG KEY : D5C9B751C4653227
irc: tigerfoot

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users