Bacula-users

Re: [Bacula-users] query for file sizes in a job

2011-10-07 14:33:06
Subject: Re: [Bacula-users] query for file sizes in a job
From: Jeff Shanholtz <jeffsubs AT shanholtz DOT com>
To: <Bacula-users AT lists.sourceforge DOT net>
Date: Fri, 7 Oct 2011 11:31:00 -0700
Thanks guys. I'm pretty sure I'm using sqlite (having a hard time
determining that definitively, but I don't think I did anything from an
installation point of view beyond just installing bacula). I assume this
script is postgresql specific. Looks like the fastest option for me is going
to be to simply search the drives of my 3 client systems for large files and
then check to see if any of those files are being backed up when they don't
need to be.

-----Original Message-----
From: Stuart McGraw [mailto:smcg4191 AT frii DOT com] 
Sent: Friday, October 07, 2011 10:30 AM
To: Bacula-users AT lists.sourceforge DOT net
Subject: Re: [Bacula-users] query for file sizes in a job

On 10/06/2011 12:36 PM, Jeff Shanholtz wrote:
> I'm currently tuning my exclude rules and one of the things I want to 
> do is make sure I'm not backing up any massive files that don't need 
> to be backed up. Is there any way to get bacula to list file sizes 
> along with the file names since llist doesn't do this?

The filesize and other file attributes are stored in
(psuedo?-)base-64 encoded form in the lstat field of the 'file' table of the
catalog database.

I ran into the same problem and, since I'm using Postgresql for my catalogs,
wrote a little pg extension function in C that is called with an lstat value
and the index number of the stat field wanted.  This is used as a base to
define some one-line convenience functions like lstat_size(text),
lstat_mtime(text), etc, which then allows one to define views like:

   CREATE VIEW v_files AS (
        SELECT f.fileid,
               f.jobid,
               CASE fileindex WHEN 0 THEN 'X' ELSE ' ' END AS del,
               lstat_size (lstat) AS size,
               TIMESTAMP WITH TIME ZONE 'epoch' + lstat_mtime (lstat) *
INTERVAL '1 second' AS mtime,
               p.path||n.name AS filename
        FROM file f
        JOIN path p ON p.pathid=f.pathid
        JOIN filename n ON n.filenameid=f.filenameid);

which generates results like:

SELECT * FROM v_files WHERE ...whatever...;

 fileid  | jobid | del |   size   |         mtime          | filename

---------+-------+-----+----------+------------------------+------------
---------+-------+-----+----------+------------------------+------------
---------+-------+-----+----------+------------------------+------------
 2155605 |  1750 |     |    39656 | 2011-10-06 21:18:17-06 |
/srv/backup/files-sdb1.txt
 2155606 |  1750 |     |     4096 | 2011-10-06 21:18:35-06 | /srv/backup/
 2155607 |  1750 | X   |        0 | 2011-10-05 19:59:34-06 |
/home/stuart/Maildir/new/1317866374.V803I580003M622752.soga.home
 2155571 |  1749 |     | 39553788 | 2011-10-05 21:24:16-06 |
/var/spool/bacula/bacula.dmp
 2155565 |  1748 |     |    39424 | 2011-10-05 20:24:49-06 |
c:/stuart/pmt.xls
 2155566 |  1748 |     |     1365 | 2011-10-05 21:22:42-06 |
c:/Local/bacula/data/pg_global.sql
 2155567 |  1748 |     | 45197314 | 2011-10-05 21:23:07-06 |
c:/Local/bacula/data/pg_jmdict.dmp

I've found it very convenient and will be happy to pass it on to anyone
interested but have to add a disclaimer is that this was the first time I've
used C in 20 years, first time I ever wrote a PG extension function and
first time I ever looked at the Bacula source code, so be warned. :-)

----------------------------------------------------------------------------
--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users