On 3/20/2011 6:52 PM, Wouter Verhelst wrote:
> Hi,
>
> At a customer, we'd been running bacula since quite some time. This is
> now running on a Debian 'lenny' that originally was an etch installation
> (with 1.38.11) and has since been upgraded. We will probably upgrade
> once more in some time to squeeze (with 5.0.2), but no concrete plans
> exist for this. It's running against a PostgreSQL 8.3 database (also the
> standard version in debian lenny).
>
> Originally, bacula ran pretty smoothly. But in recent times, mainly due
> to the volumes having gone through the roof, things don't run as
> smoothly anymore.
>
> I understand that 2.4.4 is probably not under development anymore, and
> that it's likely that none of this is going to be fixed for this branch.
> But if these issues have been fixed long ago, I'd appreciate if people
> could tell me, so I know.
>
> With the original installation, the amount of data that was added and
> then removed again on a weekly basis (we have weekly full backups) was
> quite detrimental to postgresql's autovacuum feature, to the extent that
> it wouldn't work anymore. That is, the amount of data that had been
> removed from the table would be so large that the amount of disk space
> to be released would be over a particular percentage, which triggered a
> sanity check in the autovacuum daemon, causing it to stop doing the
> autovacuum. As a result, the database files would balloon in size,
> eventually taking up 70G of data (when a dump of the database was just a
> few hundred megs). I fixed this by adding an explicit 'vacuumdb -f
> bacula' to the 'delete_catalog_backup' script.
>
> I had however failed to disable autovacuuming, and with the backup
> now requiring 3 LTO3 tapes and over 48 hours, eventually the autovacuum
> daemon started interfering; when it kicks in, it causes a database-level
> lock, which would sometimes cause the backup to fail in the following
> manner:
>
> 06-feb 10:09 belessnas-dir JobId 4241: Fatal error: sql.c:249 sql.c:249 query
> SELECT count(*) from JobMedia WHERE JobId=4241 failed:
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> (with this in the postgres log at around the same time:)
>
> 2011-02-06 10:09:19 CET LOG: autovacuum launcher started
> 2011-02-06 10:09:19 CET LOG: database system is ready to accept connections
>
> I guess what I'm saying with all this is that it might be nice if bacula
> were to play a bit more nicely with postgresql's vacuuming process,
> which is fairly essential for it to function nicely.
>
> That was last february; backups have since been running, sometimes
> okayish, sometimes not (there's also the matter of the tape robot
> sometimes having issues, but this is hardly bacula's fault).
>
> Today, then, bacula failed with the following message:
>
> 20-mrt 22:02 belessnas-dir JobId 4365: Fatal error: Can't fill File table
> Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat,
> MD5)SELECT batch.FileIndex, batch.JobId, Path.PathId,
> Filename.FilenameId,batch.LStat, batch.MD5 FROM batch JOIN Path ON
> (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name):
> ERR=ERROR: integer out of range
>
> This was accurate:
>
> bacula=# SELECT last_value from file_fileid_seq;
> last_value
> ------------
> 2147483652
> (1 row)
>
> Yes, we've been running it for several years now, and apparently we've
> written over 2 billion files to tape. I've ran an 'ALTER TABLE File ALTER
> fileid TYPE bigint' to change the fileid field into a 64 bit, rather
> than a 32 bit, variable, which should fix this for the forseeable
> future; however, I have a few questions:
> - Is it okay for me to change the data type of the 'fileid' column like
> that? Note that I've also changed it in other tables which have a
> 'fileid' column. If bacula doesn't internally use the fileid number in
> a 32 bit integer, then that shouldn't be a huge problem, but I don't
> know whether it does.
Yes, I think you're fine.
> - Since things haven't really been running smoothly here, every time the
> backup fails, the customer gets less happy with bacula. Are there any
> other people here who run bacula to write fairly large volumes of data
> to tape, and can they give me some pointers on things to avoid? That
> way, I could hopefully avoid common pitfalls before I run into them.
> Obviously if there is some documentation on this somewhere that I
> missed, a simple pointer would be nice.
> - Finally, I realize that many of these issues may be fixed in a more
> recent version of bacula, but I have no way to be sure -- this
> particular customer is the only place were I have bacula running with
> such large data volumes, and obviously just upgrading a particularly
> important server without coordination and only a vague idea that it
> *might* improve things isn't very well an option. However, if someone
> could authoritatively tell me that these issues have been fixed in a
> more recent version, then an upgrade would probably be a very good
> idea...
Others will report in on your other questions.
--
Dan Langille - http://langille.org/
------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|