You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/19 13:35:00 UTC
[jira] [Reopened] (TIKA-3196) PackageParser should attempt to parse
entries from zip files with STORED entries with data descriptor
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison reopened TIKA-3196:
-------------------------------
Reopen to fix multithreading issue in {{branch_1x}}.
> PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor
> -----------------------------------------------------------------------------------------------------
>
> Key: TIKA-3196
> URL: https://issues.apache.org/jira/browse/TIKA-3196
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Trevor Bentley
> Priority: Major
> Fix For: 2.0.0, 1.25
>
> Attachments: OOO-107047-0.oxt-145.zip
>
>
> We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream (in commons-compress) defaulting to false for 'allowStoredEntriesWithDataDescriptor'.
> Since ZipArchiveInputStream has support for reading zips with data descriptors we should attempt to read the zip with that feature enabled when we get a data descriptor UnsupportedZipFeatureException.
> Pull Request: [https://github.com/apache/tika/pull/356|https://github.com/apache/tika/pull/355]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)