You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Wurstbrot mit Senf (Created) (JIRA)" <ji...@apache.org> on 2012/02/17 13:05:59 UTC

[jira] [Created] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
--------------------------------------------------------------------------------

                 Key: COMPRESS-176
                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
             Project: Commons Compress
          Issue Type: Bug
          Components: Archivers
    Affects Versions: 1.3
         Environment: Windows 7
            Reporter: Wurstbrot mit Senf


There is a problem when handling a WinZip-created zip with Umlauts in directories.

I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.

The following problem occurs when accessing the entries of the zip:
the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).

There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).

This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216652#comment-13216652 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

In extract.c of unzip60 line 1310ff there is this code that replaces backslashes with slashes.  It only replaces them in names that don't contain forward slashes (MBSCHR looks up a character in a character array) and only if "hostnum" indicates a FAT system.

{noformat}
            /* for files from DOS FAT, check for use of backslash instead
             *  of slash as directory separator (bug in some zipper(s); so
             *  far, not a problem in HPFS, NTFS or VFAT systems)
             */
#ifndef SFX
            if (G.pInfo->hostnum == FS_FAT_ && !MBSCHR(G.filename, '/')) {
                char *p=G.filename;

                if (*p) do {
                    if (*p == '\\') {
                        if (!G.reported_backslash) {
                            Info(slide, 0x21, ((char *)slide,
                              LoadFarString(BackslashPathSep), G.zipfn));
                            G.reported_backslash = TRUE;
                            if (!error_in_archive)
                                error_in_archive = PK_WARN;
                        }
                        *p = '/';
                    }
                } while (*PREINCSTR(p));
            }
#endif /* !SFX */
{noformat}

"hostnum" is the upper byte of "version made by" inside the central directory header - this is ZipArchiveEntry's get/setPlatform - and FS_FAT_ is 0 (ZipArchiveEntry#PLATFORM_FAT).  We'd have all pieces together to emulate this.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig updated COMPRESS-176:
------------------------------------

    Attachment: test-doublevertical.zip
                MkZip.java

The attached ZIP (created by the trivial attached class) contains a file named ‖.txt and its parent directory ‖ (that's a double vertical bar) using Unicode extra fields (and nothing else) and forward slashes.

Can you please verify WinZIP is able to extract it?
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: MkZip.java, test-7zip.zip, test-doublevertical.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215704#comment-13215704 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

This is what InfoZIP's zip on Linux says:

{noformat}
stefanb@brick:~$ zip -Tv Desktop/test-winzip.zip 
Archive:  Desktop/test-winzip.zip
    testing: doc.txt.gz               OK
    testing: doc2.txt                 OK
    testing: ??\                      OK
    testing: ??\??zip.zip             OK
    testing: ??\??.txt                OK
No errors detected in compressed data of Desktop/test-winzip.zip.
test of Desktop/test-winzip.zip OK
{noformat}

The entry for the directory contains a Unicode extra field with 0xc3 0xa4 0x5c as UTF-8 encoded name.  This actually is "ä\".

Since directory names in ZIP archives must end with "/" Compress doesn't detect this as a directory.  It may be possible to create a workaround like "if the 'plain name ends with a / and the unicode name uses a \ then bend it", but I can't say I'd like that.

Java6 likely works because it doesn't have any idea about unicode extra fields and simply uses the "plain" name.  You'd get the same behavior from ZipArchiveInputStream by setting useUnicodeExtraFields to false in the constructor.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217899#comment-13217899 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

Workaround and tests are in svn revision 1294460

I'll look into creating a test archive for the opposite direction today.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Resolved] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig resolved COMPRESS-176.
-------------------------------------

    Resolution: Fixed

Great.

I explicitly told ZipArchiveOutputStream to not use the language encoding flag to ensure WinZIP uses the Unicode extra field.  Otherwise 7Zip would have worked.  Windows Conmpressed Folders simply doesn't support file names with characters that are not part of the platform's namtive encoding.

For a more complete discussion see http://commons.apache.org/compress/zip.html#encoding
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: MkZip.java, test-7zip.zip, test-doublevertical.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215664#comment-13215664 ] 

Wurstbrot mit Senf commented on COMPRESS-176:
---------------------------------------------

Btw.: I have no problems to handle this jar using java.util.zip (Java 6) for some reason :-(
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213223#comment-13213223 ] 

Sebb commented on COMPRESS-176:
-------------------------------

Thanks.

I'm beginning to wonder if Winzip is faulty.
The unicode filename that is stored uses \ whereas the base name uses /.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212641#comment-13212641 ] 

Sebb commented on COMPRESS-176:
-------------------------------

Thanks, but you have not granted the ASF licence to use the file, which means we cannot include it in our test suite.

Please could you delete and reattach it?

Also, we will need the equivalent 7zip and Win7 archives for comparison.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wurstbrot mit Senf updated COMPRESS-176:
----------------------------------------

    Attachment:     (was: test-winzip.zip)
    
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217122#comment-13217122 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

Whether we need forward slashes in Unicode extra fields can only be answered by somebody using WinZIP.  The best would be creating a test archive with a directory that contains a character in its name that is not part of CP437 - and to be safe not part of the platform's default encoding either.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216739#comment-13216739 ] 

Sebb commented on COMPRESS-176:
-------------------------------

Excellent!
Since \ and / are not allowed in file or folder names on Windows systems, there should be no case where a \ is incorrectly replaced.
And it would still work if Winzip fixes its implementation to use /, and would also work with other applications that use / for the extra fields.

==

There's still potentially the reverse problem - can Winzip handle / in the unicode extra field, or does it expect only \ ?
If so, then I guess we might need to make the generated extra fields configurable to use \.
I don't have the required version of Winzip to check that.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wurstbrot mit Senf updated COMPRESS-176:
----------------------------------------

    Attachment: test-winzip.zip

Minimum test zip attached.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb updated COMPRESS-176:
--------------------------

    Attachment: testzap-winzip.zip

Copy of test-winzip.zip, but with plain file name changed from 3zip.zip to 3zap.zap.

This shows only the plain file name in 7zip and in my copy of Winzip (9.0).

This suggests that neither is processing the unicode extra fields.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211794#comment-13211794 ] 

Sebb commented on COMPRESS-176:
-------------------------------

Could you attach minimal sample archives which show the problem?
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Bodewig updated COMPRESS-176:
------------------------------------

    Fix Version/s: 1.4
    
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215732#comment-13215732 ] 

Sebb commented on COMPRESS-176:
-------------------------------

The plain names use / and look OK when using CP437.

For some odd reason, the unicode extra fields use \ instead of /
I think that may be a Winzip bug - it does not make sense to use a different separator for the extra fields.

To confirm this is a bug, it would be useful to see how other zip tools use the extra fields - are there any?
Apart from Ant or other code based on Commons Compress, of course!

Alternatively, find some documentation as to the correct contents of the field.

My version of Winzip is too old to support the fields; if you have purchased a more recent one perhaps you could e-mail their support desk?

A possible work-round would be to make the \ => / behaviour optional; I agree we should not do this by default
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216647#comment-13216647 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

OK, this means nobody except for Commons Compress and InfoZIP tools seems to read the Unicode extra field.

This is what I get when trying to extract the original ZIP on Linux:

{noformat}
stefan@birdy:~/Desktop$ unzip test-winzip.zip 
Archive:  test-winzip.zip
  inflating: doc.txt.gz              
 extracting: doc2.txt                
warning:  test-winzip.zip appears to use backslashes as path separators
   creating: ??/
  inflating: ??/??zip.zip            
 extracting: ??/??.txt  
{noformat}

and it creates an "ä" directory.  I'll try to look through InfoZIPs sources what it bases it heuristics on, maybe we can use the same in Commons Compress to turn backslashes into slashes.

                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218158#comment-13218158 ] 

Wurstbrot mit Senf commented on COMPRESS-176:
---------------------------------------------

Seems to be OK. Got a directory ‖ with the file ‖.txt in it.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: MkZip.java, test-7zip.zip, test-doublevertical.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Sebb (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216468#comment-13216468 ] 

Sebb commented on COMPRESS-176:
-------------------------------

I have 7zip installed, and it reads the archive OK.

However, I don't think that proves anything, since the plain names are correct.

I guess we could look at the 7zip source code to see if it uses the extra fields.

A better test would be to create a zip file using a filename that cannot be represented in CP437, i.e. only the extra field would show the correct name.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218161#comment-13218161 ] 

Wurstbrot mit Senf commented on COMPRESS-176:
---------------------------------------------

But 7Zip and windows built in zip both create a directory named %U2016 with a file named %U2016.txt in it.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: MkZip.java, test-7zip.zip, test-doublevertical.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Stefan Bodewig (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216446#comment-13216446 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

AFAIK what we have written down based on findings by Wolfgang Glas in http://commons.apache.org/compress/zip.html still stands, WinZIP is the only one using Unicode extra fields, all other implementations have switched to the language encoding flag.  The only exceptions are Windows compressed folders - which doesn't understand either - and InfoZIP based tools if they are compiled to use the extra fields.

A question to the original reporter (I'm German so I know the name's a fake 8-): since you also have an installation of 7zip, what does 7zip think of your WinZIP created archive?
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217974#comment-13217974 ] 

Wurstbrot mit Senf commented on COMPRESS-176:
---------------------------------------------

Hi all, sounds promising. Thanks a lot, I'm looking forward to the next release.

And by the way, how could you tell that the name's a fake? ;-)

                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>            Assignee: Stefan Bodewig
>             Fix For: 1.4
>
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

Posted by "Wurstbrot mit Senf (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wurstbrot mit Senf updated COMPRESS-176:
----------------------------------------

    Attachment: test-winzip.zip
                test-7zip.zip
                test-windows.zip

re-added winzip zip file plus identical ones packed with 7zip and windows built-in zip facility.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in directories.
> I'm accessing a zip file created with WinZip containing a directory with an umlaut ("ä") with ArchiveInputStream. When creating the zip file the unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a directory and the file names for the directory and all files contained in that directory contain backslashes instead of slashes (i.e. completely different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which ArchiveInputStream to create or when using the ZipArchiveInputStream constructor with the correct encoding (I've tried different encodings CP437, CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira