Bacula-users

[Bacula-users] Copy jobs between two different SDs uses wrong source SD

2009-06-16 17:41:05
Subject: [Bacula-users] Copy jobs between two different SDs uses wrong source SD
From: Phil Stracchino <alaric AT metrocast DOT net>
To: bacula-users <bacula-users AT lists.sourceforge DOT net>
Date: Tue, 16 Jun 2009 17:37:24 -0400
I'm trying to set up my first Copy job, and running into a problem.  I
don't know whether it's a configuration issue, a documentation
shortfall, a Bacula limitation, or a combination of the three.


I have two SDs on two different machines.  Altogether, four pools exist,
three disk pools on one machine, one tape pool on the other.  The disk
SD is on a NAS box with a multi-terabyte SAS/SATA array, and "owns"
three disk pools on the array.  The tape SD is on a separate machine,
and "owns" a single pool and an LTO1 drive.  The disk array cannot be
connected to the tape SD because the tape SD's machine has no SAS
controllers.  The tape drive cannot be connected to the disk SD's
machine because the disk machine has no SCSI controllers and is in an
insufficiently controlled environment for the tape drive.

Backups have been running to the disk pools without incident for about
two months, and I've verified that I can run backup jobs directly to the
tape drive.

The relevant config sections are as follows:

Storage {
  Name = babylon4-sd
  Address = babylon4.babcom.com
  Maximum Concurrent Jobs = 20
  SDPort = 9103
  Password = "XXXXXXXXXXX"
  Device = FileStorage
  Media Type = File
}

Storage {
  Name = babylon5-sd
  Address = babylon5.babcom.com
  SDPort = 9103
  Password = "XXXXXXXXXXX"
  Device = Ultrium-LTO1
  Media Type = LTO1
  Maximum Concurrent Jobs = 10
}

Pool {
  Name = Full-Disk
  Storage = babylon4-sd
  Pool Type = Backup
  Next Pool = Full-Tape
  Recycle = yes
  AutoPrune = yes
  Volume Retention = 6 months
  Maximum Volume Jobs = 0
  Volume Use Duration = 23h
  Label Format =
"FULL-$Year${Month:p/2/0/r}${Day:p/2/0/r}-${Hour:p/2/0/r}:${Minute:p/2/0/r}"
  RecyclePool = Scratch
}

Pool {
  Name = Full-Tape
  Storage = babylon5-sd
  Pool Type = Backup
  Recycle = yes
  Autoprune = yes
  Volume Retention = 365d
  Recycle Oldest Volume = yes
  Recycle Current Volume = yes
  Label Format = "ARCH-"
  Maximum Volumes = 9
}


# Dummy client and fileset for the copy job

Client {
  Name = ALL
  Address = localhost
  Password = NONE
  Catalog = Catalog
}

Fileset {
  Name = DUMMY
  Include {
    Options {
      signature = MD5
    }
  }
}

JobDefs {
  Name = TapeArchive
  Type = Copy
  Pool = Full-Tape
  Level = Full
  Client = ALL
  Fileset = DUMMY
  Selection Type = PoolUncopiedJobs
  Selection Pattern = "Babylon5.*"   # this seems to be being ignored
  SpoolData = no
  Allow Duplicate Jobs = no
  Schedule = "MonthlyCopy"
  Messages = Daemon
  Priority = 20
}


Job {
  Name = "CopyToTape"
  Enabled = Yes
  Pool = Full-Disk
  JobDefs = TapeArchive
  Storage = babylon4-sd
}



When I go to run the Copy master job, I get this output:

Select Job resource (1-9): 1
Run Copy job
JobName:       CopyToTape
Bootstrap:     *None*
Client:        ALL
FileSet:       DUMMY
Pool:          Full-Disk (From Job resource)
Read Storage:  babylon4-sd (From Pool resource)
Write Storage: babylon5-sd (From Storage from Pool's NextPool resource)
JobId:         *None*
When:          2009-06-16 16:34:24
Catalog:       Catalog
Priority:      20
OK to run? (yes/mod/no):


The read and write storage appear to be correct here.

More or less the correct jobIDs get queued, except that the selection
pattern is being ignored:

Job queued. JobId=134
16-Jun 17:00 babylon4-dir JobId 134: The following 9 JobIds were chosen
to be copied: 1,4,3,2,5,6,92,93,94

The selection pattern above should theoretically have matched only jobs
92, 93 and 94, which are small test jobs.


Here's what happens when one of those queued actually tried to execute,
though:


Copying JobId 134, Job=CopyToTape.2009-06-16_17.00.57_41
16-Jun 17:01 babylon5-sd JobId 134: Failed command:
16-Jun 17:01 babylon5-sd JobId 134: Fatal error:
     Device "FileStorage" with MediaType "File" requested by DIR not
found in SD Device resources.
16-Jun 17:01 babylon4-dir JobId 134: Fatal error:
     Storage daemon didn't accept Device "FileStorage" because:
     3924 Device "FileStorage" not in SD Device resources.
16-Jun 17:01 babylon4-dir JobId 134: Error: Bacula babylon4-dir 3.0.1
(30Apr09): 16-Jun-2009 17:01:04
  Build OS:               i386-pc-solaris2.10 solaris 5.10
  Prev Backup JobId:      94
  Prev Backup Job:        Babylon5_Backup.2009-06-16_14.57.25_03
  New Backup JobId:       151
  Current JobId:          134
  Current Job:            CopyToTape.2009-06-16_17.00.57_41
  Backup Level:           Full
  Client:                 ALL
  FileSet:                "DUMMY" 2009-06-16 16:16:31
  Read Pool:              "Full-Disk" (From Job resource)
  Read Storage:           "babylon4-sd" (From Pool resource)
  Write Pool:             "Full-Tape" (From Job Pool's NextPool resource)
  Write Storage:          "babylon5-sd" (From Storage from Pool's
NextPool resource)
  Catalog:                "Catalog" (From Client resource)
  Start time:             16-Jun-2009 17:01:04
  End time:               16-Jun-2009 17:01:04
  Elapsed time:           0 secs
  Priority:               20
  SD Files Written:       0
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Volume name(s):
  Volume Session Id:      29
  Volume Session Time:    1245177530
  Last Volume Bytes:      0 (0 B)
  SD Errors:              0
  SD termination status:
  Termination:            *** Copying Error ***


So:
Job preparation seems to say the source and destination devices and
pools are correct.
The post-job output also seems to say Bacula thinks the source and
destination devices and pools are correct.

But ... the *MESSAGES* appear to say that despite Bacula being
apparently quite clear that the copy source is the Full-Disk pool on the
FileStorage device on babylon4-sd, when it tries to actually run the
job, it is trying to access the FileStorage device (which is on
babylon4) via babylon5-sd.


Can anyone shed light on this?  It looks to me as though Bacula is
basically setting everything up correctly, then contacting the wrong SD
for the source pool.


Additionally, it appears the selection pattern is being completely
ignored.  If this directive is not actually valid for a copy job, the
fact is not documented anywhere that it can be readily found.



-- 
  Phil Stracchino, CDK#2     DoD#299792458     ICBM: 43.5607, -71.355
  alaric AT caerllewys DOT net   alaric AT metrocast DOT net   phil AT 
co.ordinate DOT org
         Renaissance Man, Unix ronin, Perl hacker, Free Stater
                 It's not the years, it's the mileage.

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users

<Prev in Thread] Current Thread [Next in Thread>