Re: [Bacula-users] (no subject)

Hello Craig,

On Tue, Sep 15, 2015 at 5:24 AM, Craig Shiroma <shiroma.craig.2 AT gmail DOT com> wrote:

Hi Ana,

Thanks again for the help!

Yes, the database is large. The File table has 370+ million records and Bacula is pruning. I'll see what we can do with mysqltuner. I have a feeling I'm occasionally getting the timeout error because there are so many records. If I can't find a cure for the situation, I'm thinking of splitting up our backups using two catalog servers -- one for production and one for test/dev. Do you think this is wise? Currently, our DBAs don't support Postgres so it may not be an option. Would Oracle be an option instead?

No. Even the Enterprise version only have support for Catalog in MySQL, PostegreSQL and SQLite databases. If you do not have success with tunning or a migration to PostegreSQL is not feasible, yes, maybe spliting catalog is a good idea. Maybe if you can prune more Files (keeping jobs and volumes, but not files). The File table grows rapidily with large number of files if they are not pruned properly.

I'll have to check on the "--enable-batch-insert" option. Is there a Bacula command to see what options Bacula was built with? I did not set up our installation and am new to Bacula.

If you still have the source installation directory, you will have this information in the bacula-7.0.5/config.out file, "Batch insert enabled line. I'm quite sure this defaults to yes. So I think you have your bacula build with batch insert enabled.

Would you have any idea why a new duplicate job is starting when one was rescheduled? I looked over the configs I know about, but could not find an option that starts a new job given any situation. At least the job will have two more chances to run if I can prevent the new dup. job from starting when the lock is detected. I know that won't cure the database problem, but I have a feeling the jobs might complete successfully on the rescheduled tries (thinking the lock will be gone by then) until the DB can be tuned.

It seems that JobId 140080 was scheduled to start but this was not possible because JobId 139901 didn't finish (with or without error) due to the MySQL error. So, I suppose (don't know your schedule is) that you have the second job was not started because of the rescheduled, but because it was in your schedule. Also, a job will only be rescheduled if it fails, and it seems that it is not the case of JobId 139901: it was still running with that MySQL error. Could you confirm this please? With your schedule definitions?

Best regards,

Ana

Warmest regards,
-craig

On Mon, Sep 14, 2015 at 4:54 PM, Ana Emília M. Arruda <emiliaarruda AT gmail DOT com> wrote:
Hello Craig,

You're welcome! I hope give you some tips here. It seems you have a database tunning issue because of "Lock wait timeout exceeded; try restarting transaction". I'm not sure about your database size, but based on your JobIds, it seems large. You can try http://mysqltuner.com/ if you're using MySQL. If you have a really large database, you can think about migrating to PostgreSQL if tunning of MySQL do not solve this problem.

Have you build your bacula with "--enable-batch-insert" option? This is a good idea when dealing with large number of files.

Best regards,
Ana

On Mon, Sep 14, 2015 at 12:40 AM, Craig Shiroma <shiroma.craig.2 AT gmail DOT com> wrote:
Hi Ana,

I'm using 7.0.5. Thanks for the help!

-craig

On Sat, Sep 12, 2015 at 2:10 PM, Ana Emília M. Arruda <emiliaarruda AT gmail DOT com> wrote:
Hello Craig,

Which Bacula version are you using?

Best regards,
Ana

On Fri, Sep 11, 2015 at 4:14 PM, Craig Shiroma <shiroma.craig.2 AT gmail DOT com> wrote:
My apologies. I hit the send button before entering a subject.

On Fri, Sep 11, 2015 at 9:13 AM, Craig Shiroma <shiroma.craig.2 AT gmail DOT com> wrote:
Hello All,

I'm getting the following problem occasionally:
2015-09-10 23:47:24bacula-dir JobId 140080: Fatal error: JobId 139901 already running. Duplicate job not allowed.

Due to this type of error:
2015-09-10 23:47:22bacula-dir JobId 139901: Fatal error: sql_create.c:870 Fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction

This happens when a database lock has timed out on a backup and the job is rescheduled. For some reason, it seems a new job is starting up as soon as the error is detected. I posted about this issue earlier and someone mentioned it is happening because I configured Bacula to do that (or at least that's the impression I got from the post). Would anyone know which config would have the setting to start up a new job for the client backup when an error like a lock is detected? So far, I've only found settings for rescheduling, not restarting such as below:

Reschedule Interval = 1 hour
Reschedule Times = 3
Cancel Lower Level Duplicates = yes
Allow Duplicate Jobs = no

Obviously, the backup is getting canceled because of the last two settings above. But, what setting is causing a new job to be created when I get a lock timeout error is detected that says it has rescheduled the job for 3600 minutes later?

I realize it appears I may need to do some database fixing/turning. But, my immediate wonder is why a new job is being created when one has been rescheduled?

Regards,
-craig

------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users