You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Wolfgang Glas <wo...@ev-i.at> on 2009/03/01 22:53:48 UTC

Re: [compress] State of encoding support in ZIP package

Stefan Bodewig schrieb:
> On 2009-02-27, Wolfgang Glas <wo...@ev-i.at> wrote:
> 
>> Additionally, my experience with WinZip shows, that WinZip writes weird
>> filenames to the single-byte version of the filename when a unicode field is
>> present.
> 
> Hmm, native encoding I'd guess.

Sth like this, looks like they are writing the LSB of a 2-byte value...

> Wolfgang, could you do me a favor and please review what I've written
> for the Ant zip task manual page in svn revision 748593
> <http://svn.apache.org/viewvc?view=rev&revision=748593>, in particular
> <http://svn.apache.org/viewvc/ant/core/trunk/docs/manual/CoreTasks/zip.html?r1=748593&r2=748592&pathrev=748593>?

Seems quite OK ;-)

The one thing, I'd like to discuss is the semantics of the useEFS flag in
ZipArchiveOutputStream:

My understanding from previous discussion was, that we need a mode, where file
names not encodable by the chosen encoding are encoded in UTF-8, which is in
turn indicated by setting the EFS flag on the likewise ZIP entry. (That's the
way 7-zip handles unicode filenames...)

The current implementation of the useEFS flag simply allocs to disable the
creation of the UFS flag in ZIP entries, which are UTF-8. This approach is not
conformant with the specifiations I've read and I have not seen a single zip
implementation, which is disturbed by the EFS flag.

My opinion would be to simply drop the possibility to inhibit the EFS flag in
utf-8 encoded files and to introduce a new flag allowing to switch to utf-8
fallbacks (7-zip mode...).

What other opinion are out there?

  Wolfgang

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] State of encoding support in ZIP package

Posted by Wolfgang Glas <wo...@ev-i.at>.
Stefan Bodewig schrieb:
> On 2009-03-01, Wolfgang Glas <wo...@ev-i.at> wrote:
> 
>> My understanding from previous discussion was, that we need a mode,
>> where file names not encodable by the chosen encoding are encoded in
>> UTF-8, which is in turn indicated by setting the EFS flag on the
>> likewise ZIP entry. (That's the way 7-zip handles unicode
>> filenames...)
> 
> This is different from what we've currently implemented, but may stiil
> be useful.
> 
>> The current implementation of the useEFS flag simply allocs to
>> disable the creation of the UFS flag in ZIP entries, which are
>> UTF-8. This approach is not conformant with the specifiations I've
>> read and I have not seen a single zip implementation, which is
>> disturbed by the EFS flag.
> 
> But if there should be one - say zlib on z/OS or some other strange
> thing, it will be good to have that option available,

OK, agreed, let's keep this flag ;-)

>> My opinion would be to simply drop the possibility to inhibit the
>> EFS flag in utf-8 encoded files and to introduce a new flag allowing
>> to switch to utf-8 fallbacks (7-zip mode...).
> 
> I'm fine with an additional flag that would encode not-encodable file
> names as UTF-8 (not sure about the name of the flag and I have a long
> standing history for chosing bad names), but prefer to keep the
> existing option for the completely orthogonal case of whether we set
> the EFS at all.

OK, I will introduce an additional flag, let's call it
'setFallbackToUtf8(boolean)'. I will prepare a patch right after you've review
and (possibly) committed my latest encoding refatoring patch.

  Best regards,

    Wolfgang


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] State of encoding support in ZIP package

Posted by Stefan Bodewig <bo...@apache.org>.
On 2009-03-01, Wolfgang Glas <wo...@ev-i.at> wrote:

> My understanding from previous discussion was, that we need a mode,
> where file names not encodable by the chosen encoding are encoded in
> UTF-8, which is in turn indicated by setting the EFS flag on the
> likewise ZIP entry. (That's the way 7-zip handles unicode
> filenames...)

This is different from what we've currently implemented, but may stiil
be useful.

> The current implementation of the useEFS flag simply allocs to
> disable the creation of the UFS flag in ZIP entries, which are
> UTF-8. This approach is not conformant with the specifiations I've
> read and I have not seen a single zip implementation, which is
> disturbed by the EFS flag.

But if there should be one - say zlib on z/OS or some other strange
thing, it will be good to have that option available,

> My opinion would be to simply drop the possibility to inhibit the
> EFS flag in utf-8 encoded files and to introduce a new flag allowing
> to switch to utf-8 fallbacks (7-zip mode...).

I'm fine with an additional flag that would encode not-encodable file
names as UTF-8 (not sure about the name of the flag and I have a long
standing history for chosing bad names), but prefer to keep the
existing option for the completely orthogonal case of whether we set
the EFS at all.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org