You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/19 13:36:00 UTC

[jira] [Resolved] (TIKA-3196) PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor

     [ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-3196.
-------------------------------
    Fix Version/s:     (was: 1.25)
                   1.27
       Resolution: Fixed

> PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3196
>                 URL: https://issues.apache.org/jira/browse/TIKA-3196
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Trevor Bentley
>            Priority: Major
>             Fix For: 2.0.0, 1.27
>
>         Attachments: OOO-107047-0.oxt-145.zip
>
>
> We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream (in commons-compress) defaulting to false for 'allowStoredEntriesWithDataDescriptor'.
> Since ZipArchiveInputStream has support for reading zips with data descriptors we should attempt to read the zip with that feature enabled when we get a data descriptor UnsupportedZipFeatureException.
> Pull Request: [https://github.com/apache/tika/pull/356|https://github.com/apache/tika/pull/355]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)