Hi
i hope you are good and happy and on holiday, i am working today Smile
so we have a new esx 5.5.2 host on which a vm is present: a backupserver
running bacula, an open source backup solution.
it was running on another esx system with vcenter and stuff, and it did not
have same problem there...
the backup is failing only for one particular host (op5master server) to be
backed up which backup filesize is way bigger than others and scripts needs a
lot of time ON the remote host to collect the backup data together in order to
prepare a 3,5 GB big file
12:55 backup job starting
12:55 bacula logs into machine to be backuped and is collecting and taring
files
waiting a long time until taring is finished, in the webinterface i see the job
is running
and the duration counts until approx 31 minutes
13:26 bacula backupjob finished with state successful on machine to be backuped
the backup is ready to be taken away by the baculaserver, which will never
happen
13:29 in the webinterface i see that the duration of the job is reset to 0
minutes and
counts again until approx 15 minutes but the log is already written and states
backup successul, which means the info is collected on the backuped server
but not yet transfered on the bacula server and the storage, thats correct, i
checked
(now i dont know what happends) i dont see any logs progressing
even bacula console debug is turned on and nothing happends in the log files
i think something is waiting for a timeout which seems to be 15 minutes
13:44 i have the following in my logfiles:
Baculaserver / messages:
Dec 23 13:44:22 suorva bacula-dir: 23-Dec 13:44 Message delivery ERROR: Mail
program terminated in error.#012CMD=/usr/sbin/bsmtp -h localhost -f "(Bacula)
<root@localhost>" -s "Bacula: Backup Fatal Error of op5master.x.x.x
Incremental" root@localhost#012ERR=Child exited with code 1
Baculaserver / log file:
2014-12-23 13:28:35op5master.br.arn.se JobId 30253: ClientRunBeforeJob:
2014-12-23 12:28:35 INFO - Backup was successfully created
2014-12-23 13:44:42op5master.br.arn.se JobId 30253: Fatal error: Bad response
from stored to open command
2014-12-23 13:44:22suorvadirector JobId 30253: Error: Director's comm line to
SD dropped.
2014-12-23 13:44:22suorvadirector JobId 30253: Error: Bacula suorvadirector
5.2.13 (19Jan13):
Build OS: x86_64-redhat-linux-gnu redhat Enterprise release
(.......)
Scheduled time: 23-Dec-2014 12:55:25
Start time: 23-Dec-2014 13:28:14
End time: 23-Dec-2014 13:44:22
Elapsed time: 16 mins 8 secs
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Accurate: no
Volume name(s):
Volume Session Id: 735
Volume Session Time: 1418474844
Last Volume Bytes: 1 (1 B)
Non-fatal FD errors: 1
SD Errors: 0
FD termination status: Error
SD termination status: Error
Termination: *** Backup Error ***
Baculaserver / bacula-sd.trace:
rsgsuorvasd: dircmd.c:220-0 <dird: cancel
Job=op5master.x.x.x.2014-12-23_12.55.25_05
rsgsuorvasd: dircmd.c:234-0 Do command: cancel
rsgsuorvasd: pythonlib.c:225-0 No startup module.
+----------------------------------------------------------------------
|This was sent by adam.b.szabo AT ericsson DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|