You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Stefan Bodewig (Jira)" <ji...@apache.org> on 2020/09/11 05:53:00 UTC

[jira] [Commented] (COMPRESS-555) ZipArchiveInputStream should allow stored entries with data descriptor by default

    [ https://issues.apache.org/jira/browse/COMPRESS-555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194010#comment-17194010 ] 

Stefan Bodewig commented on COMPRESS-555:
-----------------------------------------

Unfortunately trying to read STORED entries that use a data descriptor is unreliable to say the least. It is very easy to do if you can read the central directory at the end of the archive - and thus ZipFile handles them just fine, but reading the archive as a stream is a very different issue.

The default right now will tell you "I don't think I can handle this entry" if you use the {{canReadEntryData}} method. The non-default option will read forward until it finds something that looks like the signature of the next ZIP entry. This will completely break down if the STORED entry contains such a sequence of bytes - ZIPs in ZIPs is the primary example for this (think WARs containing JARs for example). In recent versions we'll try to verify the claimed size we read from what we believe to be the data descriptor matches the length we've read, but then you are faced with an IOException for reading an entry that the stream claimed to be able to handle.

Personally I believe the option will lead to too much confusion to enable it by default. I prefer to have users take the deliberate choice and realize what they are signing up for. Even better they would find a way to make the initial stream seekable and use Zipfile rather than ZipArchiveInputStream.

> ZipArchiveInputStream should allow stored entries with data descriptor by default
> ---------------------------------------------------------------------------------
>
>                 Key: COMPRESS-555
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-555
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>    Affects Versions: 1.20
>            Reporter: Trevor Bentley
>            Priority: Major
>             Fix For: 1.21
>
>
> We are currently using tika for text extraction which uses commons-compress for handling zips. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream defaulting to false for 'allowStoredEntriesWithDataDescriptor'.
> Allowing the reading of stored entries on Zip archives should be enabled by default.
> PR: https://github.com/apache/commons-compress/pull/137



--
This message was sent by Atlassian Jira
(v8.3.4#803005)