BackupPC-users

Re: [BackupPC-users] Storing SQL backup in revision control

2008-08-27 11:25:05
Subject: Re: [BackupPC-users] Storing SQL backup in revision control
From: Adam Goryachev <mailinglists AT websitemanagers.com DOT au>
To: BackupPC User List <BackupPC-users AT lists.sourceforge DOT net>
Date: Thu, 28 Aug 2008 01:24:52 +1000
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kenneth Porter wrote:
 > Are these text files? I was toying with the idea of keeping MySQL
backups
> in a Subversion revision control system on my Linux box, and the same might 
> be possible with SQL Server. This way the revision control system would 
> store only the delta between backups, and I believe it can store the delta 
> compressed.

Yes, they are txt files, similar to the mysql dump files would be. Of
course, I know that with mysql, when you do a dump with all the mysql
specific flags enabled, you can end up with a lot of chars on each line,
so a simple diff would still equal a large amount of data. I imagine
some sort of binary diff could be significantly smaller (although this
in part also depends on what sort of activity your DB sees, lots of
updates would be better with a binary diff, lots of inserts would be
much the same either way)....

Of course, back to on-topic stuff:
It would be even better still if backuppc was able to reduce the files
into blocks/sections/etc, which would solve this 'problem' more
generally.... but that is likely to be a very challenging thing to get
working in a reliable, and efficient, manner!!!

Imagine that currently your pool has 1TB of data, if we just broke all
files into 4MB portions, then your filesystem will have a *LOT* more
files/links included...

BTW, just to throw some ideas out there, not that I know anything about
these things, or have any time to implement them, but here goes:
1) A file is a dir of links
When we backup a new file, we break it into "chunks" of 4MB, each chunk
is inserted into the pool. We then create a dir ffoobar under the
pc/blah/33 directory. Inside this directory we create a series of
hardlinks from 0 .. 43 which link to the blocks of the file in the pool.
All the rest of backuppc stays much the same as it is, we just end up
with more files and more links that we had before.... making a small
backuppc system "big" and a "big" system MASSIVE...

2) Use a DB
Pretty much as above, but use some sort of database to store which parts
of files belong to a specific file. This removes all our problems of
links/etc, but adds a pre-requisite of a database to your backup
system... This complicates the backup of the backup server in some ways,
but also makes it a lot simpler in other ways. (There are defined
methods of replicating a DB to another host, and we no longer have the
problem of massive numbers of hardlinks to replicate, just copy the pool
and the DB).

3) I was going to suggest that we use a file as a file
Basically, under the pc/host/33 we store a file which contains
binary/ascii format data describing which pool files are needed and in
which order. This is pretty useless because it makes it near impossible
to determine which pool files can be removed when they are no longer needed.

BTW, is anyone working on block level data de-duplication for backuppc ?
I remember some discussion some time ago, but don't recall how far it
progressed, or if it was even on the roadmap for development...

Regards,
Adam
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFItXHEGyoxogrTyiURAipQAKCQGILvLYi49t6S51CuoRhQbQaXfACdHqE+
az7ButW7isQsJgwgkRwk/hE=
=tZzo
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/