You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Stefan Bodewig (JIRA)" <ji...@apache.org> on 2017/11/15 16:25:00 UTC

[jira] [Commented] (COMPRESS-429) Expose whether ZIP entry name & comment come from Unicode extra field

    [ https://issues.apache.org/jira/browse/COMPRESS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253732#comment-16253732 ] 

Stefan Bodewig commented on COMPRESS-429:
-----------------------------------------

In my experience only WinZip uses the unicode extra field, all others (apart from Windows Compressed Folders, which doesn't support Unicode at all) have switched to the EFS flag by now. So maybe you do not want to put too much effort in reading the extra field. In addition when we look at what WinZip does (COMPRESS-427 and COMPRESS-176) it's hard to say one could trust its content.

{{hasUnicodeName()}} would be equivalent to {{getExtraField(UnicodePathExtraField.UPATH_ID) != null}} and you'd probably want to call {{getExtraField}} if this was true anyway - just in case the {{ZipFile}} or stream has been constructed with {{useUnicodeExtraFields}} set to false.

> Expose whether ZIP entry name & comment come from Unicode extra field
> ---------------------------------------------------------------------
>
>                 Key: COMPRESS-429
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-429
>             Project: Commons Compress
>          Issue Type: Improvement
>            Reporter: Damiano Albani
>            Priority: Minor
>              Labels: Unicode, ZIP
>
> It is known fact that detecting the encoding of the name/comment of ZIP entries is a messy process. And that the general purpose bit 11 is often unreliable.
> Only the so-called Unicode extra field (if present) can be trusted to reliably determine a ZIP entry name & comment, as far as I understand.
> But the current API of Commons Compress doesn't (easily) expose in which situation the ZIP archive reader is.
> That's why I propose to add a couple of new getter/setter-exposed fields to {{ZipArchiveEntry}}, e.g.:
> {noformat}
> boolean hasUnicodeName
> boolean hasUnicodeComment
> {noformat}
> This way it can be easily determined if the value returned by {{ZipArchiveEntry::getName}} or {{ZipArchiveEntry::getComment}} can be trusted. Or if it needs some "character encoding sniffing" of sorts.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)