BackupPC-users

Re: [BackupPC-users] BackupPC_zipCreate and charset for encoding file names

2008-10-21 04:43:26
Subject: Re: [BackupPC-users] BackupPC_zipCreate and charset for encoding file names
From: Alexander Moisseev <moiseev AT mezonplus DOT ru>
To: Craig Barratt <cbarratt AT users.sourceforge DOT net>
Date: Tue, 21 Oct 2008 12:37:51 +0400
Only documentation I can find is Application Note on the .ZIP file format from 
PKWARE 
http://www.pkware.com/documents/casestudies/APPNOTE.TXT

I realized that:
1. ZIP format officially supports only ISO8859–1 file names and not include any 
information about encoding at all.
2. In fact most of Windows archiver programs use OEM encoding.
3. But some one (e.g. IZarc, Info-Zip, Wiz) leaves file names without recoding.
4. Not sure, but seems like Unix/Linux archivers uses current locale encoding.
5. UTF-8 file name storage appears in version 6.3.2 of APPNOTE about 1 year ago.

I can't succeed with UTF-8 archives on Windows yet. May be somebody knows what 
windows archivers currently support UTF-8?

For 2. encoding must be OEM (e.g. CP866 for Russian).
For 3. encoding must be $Conf{ClientCharset} (e.g. CP1251 for Russian).

Quote from APPNOTE:
          The upper byte indicates the compatibility of the file
          attribute information.  If the external file attributes 
          are compatible with MS-DOS and can be read by PKZIP for 
          DOS version 2.04g then this value will be zero.  If these 
          attributes are not compatible, then this value will 
          identify the host system on which the attributes are 
          compatible.  Software can use this information to determine
          the line record format for text files etc.

So, if we do "windows" encoding of file names, we also must set MS-DOS 
compatibility.

I have done some experimenting.
If I change in  Central directory structure of zip archive (central file header 
signature 0x02014b50) the upper byte of "version made by" field from 3 (UNIX) 
to 0 (MS-DOS) and zip archive was created with OEM encoding (CP866 for Russian) 
both 2. and 3. types of archivers displays file names perfectly well.

Craig, is it possible to set "version made by" field with BackupPC_zipCreate?

Alexander

Craig Barratt wrote:
> I added the command-line argument for charset but didn't implement
> a CGI setting.  The problem I have is I can't find any documentation
> for zip files and any standards around charset encoding for the
> file names in a zip file.  Is utf8 the correct default, or does
> it depend on which platform is trying to unpack the zip file?
> 
> Craig

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
<Prev in Thread] Current Thread [Next in Thread>