You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Stefan Bodewig (JIRA)" <ji...@apache.org> on 2018/09/27 15:00:00 UTC

[jira] [Commented] (COMPRESS-466) Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile

    [ https://issues.apache.org/jira/browse/COMPRESS-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16630569#comment-16630569 ] 

Stefan Bodewig commented on COMPRESS-466:
-----------------------------------------

Commons Compress parses the extra fields of local file headers in addition to the extra fields of the central data section - which the java.util version does not.

The less technical description is that java.util.ZipFile may be missing important data for the entries that the Commons Compress version provides. In many if not most cases there will be no difference, though.

Right now there is no way around it, but it would certainly be possible to add a flag to ZipFile's constructor that says "I know that parsing the central data section is enough" and skip this step.

There is at least one thing I'm aware of that won't work if we skip reading the local file header: reading entry names or comments from unicode extra fields. See http://commons.apache.org/proper/commons-compress/zip.html#Encoding

The resolveLocalFileHeaderData method does a few additional things that would need to be handled in a different way if it was skipped (making sure we know all entries that share the same name and ensuring we find the proper start of the data stream).

> Opening of a very large zip file is extremely slow compared to java.util.zip.ZipFile
> ------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-466
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-466
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.18
>         Environment: Tested both on Linux and OSX 10.13.6.
>            Reporter: Jakob Sultan Ericsson
>            Priority: Major
>
> We have a quite large zip file 35 gb and try to open this with ZipFile. 
> {code:java}
>         try (ZipFile zf = new ZipFile(new File("35gb.zip"))) {
>             System.out.println("File opened..." + (System.currentTimeMillis() - start));
>         }
> {code}
> This code takes about 300 000 - 400 000 ms (5-6 minutes).
> If I run this with JDK-builtin java.util.zip.ZipFile, same code takes 300 ms (less than a second). 
> I'm not totally sure what it is the problem but I did some debugging and basically all time is spent in
> {code:java}
>     private void resolveLocalFileHeaderData(final Map<ZipArchiveEntry, NameAndComment> entriesWithoutUTF8Flag)
> {code}
> Anything that can be done to improve this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)