BackupPC-users

Re: [BackupPC-users] Recompressing individual files in pool

2011-07-03 20:31:31
Subject: Re: [BackupPC-users] Recompressing individual files in pool
From: "Jeffrey J. Kosowsky" <backuppc AT kosowsky DOT org>
To: "General list for user discussion, questions and support" <backuppc-users AT lists.sourceforge DOT net>
Date: Sun, 03 Jul 2011 20:29:48 -0400
Holger Parplies wrote at about 20:10:20 +0200 on Sunday, July 3, 2011:
 > Hi,
 > 
 > Kelly Sauke wrote on 2011-07-01 09:21:28 -0500 [[BackupPC-users] 
 > Recompressing individual files in pool]:
 > > I have a need to modify certain files from backups that I have in 
 > > BackupPC.  My pool is compressed and I've found I can decompress single 
 > > files using BackupPC_zcat.  I can then modify those files as needed, 
 > > however I cannot figure out how to re-compress those modified files to 
 > > be put back into the pool.  Is there a tool available that can do that?  
 > 
 > no. It's not a common requirement to be able to modify files in backups.
 > Normally, a backup is intended to reflect the state a file system was in at
 > the time the backup was taken, not the state the file system *should have*
 > been in or the state *I'd like it* to have been in. I sure hope you have
 > legitimate reasons for doing this.
 > 
 > If you are modifying files, you'll need to think about several things.
 > 
 > * Do you want to modify every occurrence of a specific content (i.e. all
 >   files in all backups linked to one pool file) or only specific files,
 >   while other files continue to contain the unmodified content?

And this may be subtle. You may have other occurrences that you forgot
about or are not aware of (say another machine with the same file or
an earlier backup you had saved). Destructively editing the pool is
not something to do without thinking...

 > * If you are modifying every occurrence of a specific content, you'll either
 >   have to find out which files link to the pool file (hard, with a reasonably
 >   sized pool) or ensure you're updating the content without changing the 
 > inode
 >   (i.e. open the file for write, not delete and re-create it). If you do 
 > that,
 >   there is not much you can do for failure recovery. Your update had better
 >   succeed.
 > 
 > * Does your update change the partial file md5sum? If so, you'll need to move
 >   the pool file to its new name and location. Presuming the new content
 >   already exists, you should probably create a hash collision. That may be
 >   less efficient than linking to the target pool file, but it should be legal
 >   (when the maximum link count is exceeded, a second pool file with identical
 >   content is created; later on the link count on the first file may drop due
 >   to expiring backups), and it's certainly simpler than finding all the files
 >   linked to your modified pool file and re-linking them to the pre-existing
 >   pool file.
 > 

Yes - unless you are just changing content between the first and last
chunks (keeping the file size the same), the partial file md5sum will
change.
That being said, while it is technically correct and advisable to
rename the file with the correct partial file md5sum (including
adjusting the suffix for potential collisions), it is not strictly
necessary. Indeed, I have had the pleasure of finding several bugs
within BackupPC or its libraries that result in wrong md5sum names
even under normal conditions. The only real downside of not changing
the name is that new versions of the file will not be pooled and will
be stored under the correct md5sum name (Note: I am not advising not
changing the name, just saying it is not strictly necessary).

Another perhaps more important issue is that you really need to change
the attrib file. While changing the accesss/mod times may not matter,
adjusting the uncompressed filesize (if it changes) is important since
some routines may/do use that file size, rather than decompressing the
entire file to calculate its size. In any case, even if not critical,
having an inconsistency between the actual file size and the size
noted in the attrib file is not a good idea and might suggest to
anybody or any routine not aware of your monkeying with the file that
there has been some serious data corruption.

The bottom line is that editing existing files is possible (and indeed
I do a lot more 'messy' things in my BackupPC_deleteFile routine)
*but* you need to think of all the side-effects and end cases to make
sure you won't be messing anything else up.



 > * If you're only changing individual files in a pc/ directory, the matter is
 >   far more simple. You'll need to take some code from the BackupPC sources
 >   for compressing anyway, so you might as well take the part that handles
 >   pooling as well (see BackupPC::PoolWrite and note that you'll be coding in
 >   Perl ;-).
 > 

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>