Bacula-users

Re: [Bacula-users] File jobs purged when they should not

2012-11-22 12:24:41
Subject: Re: [Bacula-users] File jobs purged when they should not
From: Felip Moll <lipixx AT gmail DOT com>
Date: Thu, 22 Nov 2012 18:19:35 +0100
2012/11/20 Dan Langille <dan AT langille DOT org>
Please respond to the bottom of the message, or inline.  This makes it easier to follow the discussion.


Well, these is a question on how your mail client manages the e-mail views or how you're used to read it. :P.
I'll respond inline.

 

Right. I perfectly understood these concepts previously of having my problem. Maybe I didn't explain well and with the wrong words. I'll try to do it better here.

My question is exactly about the "strength and the weakness of the catalog" you mention.

Simplifying it:

I have a client with TiBs of data that I want to be able to:

- Restore a files from now to 7 days ago, exploring the list of files, and jobs.      -> performed every day
- Restore a files from now to 1 month ago, exploring the list of files, and jobs.    -> performed every sunday
- Restore a files from now to 6 months ago, exploring the list of Jobs.                 -> performed every 1st
- Restore a files from now to 5 years ago (law things), exploring the list of Jobs.  -> performed every 1st of Jan

I currently have defined for every of these backups to go to a different pool, each pool with N different tapes (volume), and of course each with different Volume retention periods.

My backups are performed with 1 Full backup at the init of the period and with successive incremental ones, except on the case of 5 years backups, that are only full backups.

For example, for the 1 month requisite, I am doing it on Sunday 4 (Full), 11 (Inc), 18 (Inc), 25 (Inc), 2 (Full).... I set volume retention to the double of these period (2 months) as recommended by Bacula guide.

Let's continue ...
 
Also note, that the lesser of all the retention periods applies when it comes to recycling.

>>From http://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html#SECTION0022130000000000000000 :

Under 'The Client Resource', File Retention = time-period-specification

"File records may actually be retained for a shorter period than you specify on this directive if you specify either a shorter Job Retention or a shorter Volume Retention period. The shortest retention period of the three takes precedence."

Please read through all that again, and make sure.  I hope I cannot be misunderstood.  :)


Yeah, I understood all correctly.


 
> So, probably, the pool defined for a 7 days retention is pruning all files from the backups defined to run in a different pool, eg. 4 weeks, 1 month, 5 years pools... "messing up" these jobs.

Yes.  It sounds like Bacula is doing as instructed.  Or more precisely, as permitted by the specified Retention periods.


Ok, here is the problem.

If I wanted to follow the schema that I explained over here, I will not be able to mantain the list of files in the catalog for the 1 month jobs.


 
> I have some clients with TiB of data, and I would like to have a list of files to recover in a 7 days period, and every 7 days prune oldest jobs and files. I want also that this clients backup their data to the pool of 1 month and 5y, but not saving files... so from what you said, i must define multiple clients for every "real client".

Multiple clients are not required.

I was unable to follow your requirements.  Perhaps if you express them in terms of:

* I want to be able to restore an individual file that was backed up X days ago.
* I want to keep all full backups for X day.


The only way that seems to be the way is to define a Client to perform the 7 days backups and a Client (for the same host) to perform all the others. By this way, I will be able to save both 1 month and 7 days list of files in the catalog and I will be able to recover from both selecting the files and not restoring the full backup.

Why I would to do this?

In one hand because TiBs of data of millions of files implies a huge catalog. Every day the backups inserts a lot of entries in the catalog making it bigger and bigger. A huge catalog implies slow queries to the db, so slow tree generation for recovery (mysql took more than 1 day once.. I switched to postgresql).
I talked about one server, but I have about 30 servers, of course not all with TiBs of data, but increasing the overload of the whole thing.

In the other hand because I don't want to restore an entire job of TiB of data from only 1 month ago to recover one single file.



Keep in mind: database records are cheap.  A few GB of disk space is a cheap price to pay to be able to restore any file backed up in the past 3 years.

> Is it correct?

Now that we know the problem, I think we can move on to defining your retention periods based on the sample statements I suggested above.

:)



As you can figure out, these data is sensitive and are important and  valuous for the business, and I've been said to keep these data saved with all this frequencies. For this, I take your words and I agree with you that for this, database records are cheap.

I expect to explained better my requeriments.

Thanks a lot for all your help, I am happy to see efforts like yours (and other that responded) to help the community.


Felip Moll
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>