You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/19 13:36:00 UTC
[jira] [Resolved] (TIKA-3196) PackageParser should attempt to parse
entries from zip files with STORED entries with data descriptor
[ https://issues.apache.org/jira/browse/TIKA-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-3196.
-------------------------------
Fix Version/s: (was: 1.25)
1.27
Resolution: Fixed
> PackageParser should attempt to parse entries from zip files with STORED entries with data descriptor
> -----------------------------------------------------------------------------------------------------
>
> Key: TIKA-3196
> URL: https://issues.apache.org/jira/browse/TIKA-3196
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Trevor Bentley
> Priority: Major
> Fix For: 2.0.0, 1.27
>
> Attachments: OOO-107047-0.oxt-145.zip
>
>
> We are currently using tika for text extraction. Currently some sites are returning zips that have entries with stored data descriptors which fail to extract due to the ZipArchiveInputStream (in commons-compress) defaulting to false for 'allowStoredEntriesWithDataDescriptor'.
> Since ZipArchiveInputStream has support for reading zips with data descriptors we should attempt to read the zip with that feature enabled when we get a data descriptor UnsupportedZipFeatureException.
> Pull Request: [https://github.com/apache/tika/pull/356|https://github.com/apache/tika/pull/355]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)