Bacula-users

Re: [Bacula-users] query for file sizes in a job

2011-10-10 23:08:05
Subject: Re: [Bacula-users] query for file sizes in a job
From: ganiuszka <ganiuszka AT gmail DOT com>
To: bacula-users AT lists.sourceforge DOT net
Date: Tue, 11 Oct 2011 05:05:21 +0200
2011/10/7 Jeff Shanholtz <jeffsubs AT shanholtz DOT com>:
> Thanks guys. I'm pretty sure I'm using sqlite (having a hard time
> determining that definitively, but I don't think I did anything from an
> installation point of view beyond just installing bacula). I assume this
> script is postgresql specific. Looks like the fastest option for me is going
> to be to simply search the drives of my 3 client systems for large files and
> then check to see if any of those files are being backed up when they don't
> need to be.


Hi,

I wrote a PHP script for getting files sizes and path of backup files
by "JobId" value. It has also possibility setting sorting method files
sizes and setting unit for sizes (bytes, kilobytes, megabytes etc.).

I think that in your case this script may be useful.

I know that there is possibility to write functions in SQLite, and
maybe it is a good way to realize all algorithms from my script. In
spite of it I used PHP script method.

Yes,  I remember thread about decoding LStat value on this mailing
list. In that time I wrote PHP implementation of Bacula LStat decoder
in PHP. Now, I used this for decoding the LStat value in below script.

I hope that it will be useful. Any suggestions and bugs are welcome.

I did not test this script too hard. I did the only basic tests.

Before using script you need to edit constat named 'DB_FILE' which
contains string with localization of SQLite database file.

Regards.
gani

<?php

/**
 * Script for listing sizes and paths files for selected JobId from
SQLite Bacula database.
 *
 * @author gani <redakcja AT bacula DOT pl>
 *
 * Requirements
 *     PHP with modules:
 *       - BC Math
 *       - PDO for SQLite
 *
 * Example of use for jobid: 23, sort: asc, unit: megabytes:
 *
 * php bacula-size-and-path-by-jobid.php 23 asc m
 *
 * Examle output:
 * 12 | /etc/aaa.conf
 * 14 | /etc/bbb.conf
 * 21 | /etc/ccc.conf
 */

// NOTE! Before run this script please update SQLite database file localization.
const DB_FILE = '/tmp/bacula/var/bacula/working/bacula.db';

class BaculaJobDetails {

        private $joibid;
        private $order;
        private $unit;

        private $order_size_values = array('asc', 'desc');

        private $units = array('b' => 1, 'k' => 1024, 'm' => 1048576, 'g' =>
1073741824, 't' => 1099511627776);

        public function __construct($jobid, $order, $unit) {
                $this->jobid = intval($jobid);
                $this->order = $this->validate_order($order);
                $this->unit = $this->validate_unit($unit);
        }

        private function validate_order($order) {
                $order = strtolower($order);
                if(!in_array($order, $this->order_size_values)) {
                        die('You entered wrong sort order. Available sort 
orders are: asc or desc');
                }
                return $order;
        }

        private function validate_unit($unit) {
                $unit = strtolower($unit);
                if(!array_key_exists($unit, $this->units)) {
                        die('You entered wrong unit. Available units are: b, k, 
m, g, t');
                }
                return $unit;
        }

        private function decode_bacula_lstat($lstat) {
                $base64 = 
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
                $lstat = trim($lstat);
                $lstat_fields = explode(' ', $lstat);

                if(count($lstat_fields) !== 16) {
                        die('Error! Invalid LStat values count for ' . $lstat);
                }

                list($dev, $inode, $mode, $nlink, $uid, $gid, $rdev, $size,
$blocksize, $blocks, $atime, $mtime, $ctime, $linkfi, $flags, $data) =
$lstat_fields;
                $encoded_values = array('dev' => $dev, 'inode' => $inode, 
'mode' =>
$mode, 'nlink' => $nlink, 'uid' => $uid, 'gid' => $gid, 'rdev' =>
$rdev, 'size' => $size, 'blocksize' => $blocksize, 'blocks' =>
$blocks, 'atime' => $atime, 'mtime' => $mtime, 'ctime' => $ctime,
'linkfi' => $linkfi, 'flags' => $flags, 'data' => $data);

                $ret = array();
                foreach($encoded_values as $key => $val) {
                        $result = 0;
                        $is_minus = false;
                        $start = 0;

                        if(substr($val, 0, 1) === '-') {
                                $is_minus = true;
                                $start++;
                        }

                        for($i = $start; $i < strlen($val); $i++) {
                                $result = bcmul($result, bcpow(2,6));
                                $result +=  strpos($base64, substr($val, $i , 
1));
                        }
                        $ret[$key] = ($is_minus === true) ? -$result : $result;
                }
                return $ret;
        }

        private function get_db() {
                $dsn = 'sqlite:' . DB_FILE;
                try {
                        $db = new PDO($dsn);
                } catch (PDOException $e) {
                        die('Connection to database failed: ' . 
$e->getMessage());
                }
                return $db;
        }

        private function get_job_details() {
                $size = array();
                $path = array();
                $job_details = array();
                
                $sql = 'SELECT File.Lstat as lstat, Path.Path as file_path,
Filename.name as file_name FROM File JOIN Path ON
Path.PathId=File.PathId JOIN Filename ON File.JobId=' . $this->jobid .
' AND Filename.FilenameId=File.FilenameId;';
                $query = $this->get_db()->query($sql);
                $rows = $query->fetchAll();

                if(count($rows) === 0) {
                        die('JobId ' . $this->jobid . ' does not exists or does 
not have any file.');
                }

                foreach ($rows as $key => $row) {
                        $lstat = $this->decode_bacula_lstat($row['lstat']);
                        $size = $this->setUnitForSize($lstat['size']);
                        $job_details[] = array('size' => $size, 'path' => 
$row['file_path']
. $row['file_name']);
                }

                $this->sort_job_details($job_details);
                return $job_details;
        }

        private function setUnitForSize($size) {
                return bcdiv($size, $this->units[$this->unit]);
        }

        private function sort_job_details(&$job_details) {
                for($i = 0; $i < count($job_details); $i++) {
                        $size[$i] = $job_details[$i]['size'];
                        $path[$i] = $job_details[$i]['path'];
                }
                if($this->order === 'asc') {
                        array_multisort($size, SORT_ASC, $job_details);
                } elseif($this->order === 'desc') {
                        array_multisort($size, SORT_DESC, $job_details);
                }
        }

        public function print_job_details() {
                $job_details = $this->get_job_details();
                foreach($job_details as $val) {
                        print $val['size'] . ' | ' . $val['path'] . "\n";
                }
        }
}

function usage() {
        echo '
' . basename(__FILE__)  . ' <jobid> <sort> <unit>
jobid - Job identifier
sort  - sort order by filesize (asc or desc)
unit  - unit for show (b - bytes, k - kilo, m - mega, g - giga, t - tera)
';
}

if(count($argv) === 4) {
        list($script, $jobid, $order, $unit) = $argv;
        $obj = new BaculaJobDetails($jobid, $order, $unit);
        $obj->print_job_details();
} else {
        usage();
}

?>


>
> -----Original Message-----
> From: Stuart McGraw [mailto:smcg4191 AT frii DOT com]
> Sent: Friday, October 07, 2011 10:30 AM
> To: Bacula-users AT lists.sourceforge DOT net
> Subject: Re: [Bacula-users] query for file sizes in a job
>
> On 10/06/2011 12:36 PM, Jeff Shanholtz wrote:
>> I'm currently tuning my exclude rules and one of the things I want to
>> do is make sure I'm not backing up any massive files that don't need
>> to be backed up. Is there any way to get bacula to list file sizes
>> along with the file names since llist doesn't do this?
>
> The filesize and other file attributes are stored in
> (psuedo?-)base-64 encoded form in the lstat field of the 'file' table of the
> catalog database.
>
> I ran into the same problem and, since I'm using Postgresql for my catalogs,
> wrote a little pg extension function in C that is called with an lstat value
> and the index number of the stat field wanted.  This is used as a base to
> define some one-line convenience functions like lstat_size(text),
> lstat_mtime(text), etc, which then allows one to define views like:
>
>   CREATE VIEW v_files AS (
>        SELECT f.fileid,
>               f.jobid,
>               CASE fileindex WHEN 0 THEN 'X' ELSE ' ' END AS del,
>               lstat_size (lstat) AS size,
>               TIMESTAMP WITH TIME ZONE 'epoch' + lstat_mtime (lstat) *
> INTERVAL '1 second' AS mtime,
>               p.path||n.name AS filename
>        FROM file f
>        JOIN path p ON p.pathid=f.pathid
>        JOIN filename n ON n.filenameid=f.filenameid);
>
> which generates results like:
>
> SELECT * FROM v_files WHERE ...whatever...;
>
>  fileid  | jobid | del |   size   |         mtime          | filename
>
> ---------+-------+-----+----------+------------------------+------------
> ---------+-------+-----+----------+------------------------+------------
> ---------+-------+-----+----------+------------------------+------------
>  2155605 |  1750 |     |    39656 | 2011-10-06 21:18:17-06 |
> /srv/backup/files-sdb1.txt
>  2155606 |  1750 |     |     4096 | 2011-10-06 21:18:35-06 | /srv/backup/
>  2155607 |  1750 | X   |        0 | 2011-10-05 19:59:34-06 |
> /home/stuart/Maildir/new/1317866374.V803I580003M622752.soga.home
>  2155571 |  1749 |     | 39553788 | 2011-10-05 21:24:16-06 |
> /var/spool/bacula/bacula.dmp
>  2155565 |  1748 |     |    39424 | 2011-10-05 20:24:49-06 |
> c:/stuart/pmt.xls
>  2155566 |  1748 |     |     1365 | 2011-10-05 21:22:42-06 |
> c:/Local/bacula/data/pg_global.sql
>  2155567 |  1748 |     | 45197314 | 2011-10-05 21:23:07-06 |
> c:/Local/bacula/data/pg_jmdict.dmp
>
> I've found it very convenient and will be happy to pass it on to anyone
> interested but have to add a disclaimer is that this was the first time I've
> used C in 20 years, first time I ever wrote a PG extension function and
> first time I ever looked at the Bacula source code, so be warned. :-)



-- 
"Większej miłości nikt nie ma nad tę, jak gdy kto życie swoje kładzie
za przyjaciół swoich." Jezus Chrystus

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users