Bacula-users

Re: [Bacula-users] Bacula BLOCKED waiting for mount

2015-08-03 16:37:34
Subject: Re: [Bacula-users] Bacula BLOCKED waiting for mount
From: Michael Schwager <mschwager AT mochotrading DOT com>
To: Dimitri Maziuk <dmaziuk AT bmrb.wisc DOT edu>
Date: Mon, 3 Aug 2015 15:35:39 -0500
Well, to answer my own question, here's the algorithm (use The Source, Luke!):

CLUE1: During the backup, after the previous tape filled, my log file says, "There are no more Jobs associated with Volume "EPW681L3". Marking it purged."

That text comes from ua_purge.c, which runs the following SQL command in the is_volume_purged() function:
SELECT 1 FROM JobMedia WHERE MediaId=%s LIMIT 1

The MediaId comes from an argument to the command. So somehow Bacula is looking over the volumes. How? And somehow it skipped over the volume I thought it would choose, EPW680L3. Why?

According to autoprune.c, the aforementioned message string comes from prune_volumes() in auto_prune.c. But before I dig into that, I need to back up. prune_volumes() was called by find_next_volume_for_append(), which does this:
       *  1. Look for volume with "Append" status.
...The query is:
SELECT MediaId,VolumeName,VolJobs,VolFiles,VolBlocks,"
         "VolBytes,VolMounts,VolErrors,VolWrites,MaxVolBytes,VolCapacityBytes,"
         "MediaType,VolStatus,PoolId,VolRetention,VolUseDuration,MaxVolJobs,"
         "MaxVolFiles,Recycle,Slot,FirstWritten,LastWritten,InChanger,"
         "EndFile,EndBlock,VolParts,LabelType,LabelDate,StorageId,"
         "Enabled,LocationId,RecycleCount,InitialWrite,"
         "ScratchPoolId,RecyclePoolId,VolReadTime,VolWriteTime,ActionOnPurge "
         "FROM Media WHERE PoolId=%s AND MediaType='%s' AND Enabled=1 "
         "AND VolStatus='%s' AND InChanger=1 AND StorageId=%s
with
      if (strcmp(mr->VolStatus, "Recycle") == 0 ||
          strcmp(mr->VolStatus, "Purged") == 0) {
         order = "AND Recycle=1 ORDER BY LastWritten ASC,MediaId";  /* take oldest that can be recycled */
      } else {
         order = sql_media_order_most_recently_written[db_get_type_index(mdb)];    /* take most recently written */
      }

Now here's what happens if it does not find a volume with "Append" status:
      if (!ok) {
       * No volume found, apply algorithm
Here is the argorithm:
       * 2. Try finding a recycled volume
      if (!ok) {
       * 3. Try recycling any purged volume
      if (!ok) {
       * 4. Try pruning Volumes
      if (!ok && create) {
       * 5. Try pulling a volume from the Scratch pool
       * If we are using an Autochanger and have not found
       * a volume, retry looking for any volume.
       if (!ok && InChanger) {
         InChanger = false;
         continue;           /* retry again accepting any volume */
       }

So- there is an algorithm! Quite elaborate, at that.

Now back to prune_volumes(). If it cannot find a volume, it will prune one. prune_volumes() says,
 *   Prune at least one Volume in current Pool. This is called from
 *   catreq.c => next_vol.c when the Storage daemon is asking for another
 *   volume and no appendable volumes are available.
prune_volumes first looks in the Scratch pool somehow. Mine is empty so I'll skip right over that part.

Next, it performs a query on the database:
   /*
    * Get the List of all media ids in the current Pool or whose
    *  RecyclePoolId is the current pool or the scratch pool
    */
   const char *select = "SELECT DISTINCT MediaId,LastWritten FROM Media WHERE "
        "(PoolId=%s OR RecyclePoolId IN (%s)) AND MediaType='%s' %s"
        "ORDER BY LastWritten ASC,MediaId";

the PoolId comes from "mr->PoolId", where mr is of type MEDIA_DBR which is a class that is a reflection of a row in the Media table of the databes.

"ua->jcr" which I presume is a "Job Control Record". The PoolId's can be found in the Pool table. In my case, my Pool name is "Tape" so my PoolId is 5. I don't have a RecyclePoolId but that seems pretty straightforward. In any event, in my case it's probably the same as my Pool id: 5. MediaType comes from mr->MediaType which again comes straight from the bacula-dir.conf file; my MediaType is 'LTO-3'.  The last string (the fourth %s) is I believe either empty or "AND InChanger=1 AND StorageId IN (%s)", where StorageId comes from the Storage table. For me, I have a Storage record that specifies a name of LTO-3 so StorageId found in the database is simply 3.

So my backup is trying to write to a tape in my 'Tape' Pool, on my 'LTO-3' Storage device. Thus the full query should be:

*select = "SELECT DISTINCT MediaId,LastWritten FROM Media WHERE "
        "(PoolId=5 OR RecyclePoolId IN (5)) AND MediaType='LTO-3' "
        "ORDER BY LastWritten ASC,MediaId";


At this point, it goes through each volume and prunes them. When it finds a volume that's then "purged", it will use it. My volume EPW680L3 is not purged, so it doesn't use it.

How is "purged" defined? This is found in is_volume_purged() in ua_purge.c:
  • If FirstWritten or LastWritten are 0, the volume is not purged (I believe "0" is the same as "1970-01-01 00:00:00")
  • If the Volume Status is "Purged" then the volume is purged.
  • If there are no Jobs associated with the MediaId in the JobMedia database table, then the volume is purged.

In my case, there are Jobs associated with the MediaId in the JobMedia database table, but this JobId does not exist in the Job table. Therefore I believe something happened- maybe I purged this job in the past, in my ignorance- and thus the tape cannot be pruned and thus will not be purged, so Bacula won't use it. I need to investigate a little more.



- Mike Schwager
  Linux Network Engineer, Mocho Trading LLC
  312-646-4783 Phone    312-637-0011 Cell    312-957-9804 Fax


On Mon, Aug 3, 2015 at 9:32 AM, Dimitri Maziuk <dmaziuk AT bmrb.wisc DOT edu> wrote:
On 2015-08-03 08:45, Michael Schwager wrote:

> Under v. 5.x, I discovered that Bacula's tape usage algorithm went
> something like this: "Use an eligible tape from the library." I
> discovered (I think) that I could use any tape that met Bacula's
> rewritable criteria. Now, I don't know what the algorithm is.

I had 5.x go from 001_010 to 002_010, then 003_010, etc. even though
there were 5 more blank tapes in magazine 001 and 15 blank tapes in each
002, 003, etc. So I'm pretty sure Bacula's tape selection algorithm is
unpredictable -- at least to an end-user who isn't Kern. You just don't
get hit by it very often.

Dima



------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


This message is for the named person(s) use only. It may contain confidential proprietary or legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender. You must not, directly or indirectly use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Mocho Trading LLC reserves the right to monitor all e-mail communications through its networks. Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorized to state them to be the views of any such entity.
------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users