You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Luís Filipe Nassif (Jira)" <ji...@apache.org> on 2022/03/16 00:28:00 UTC

[jira] [Comment Edited] (TIKA-3701) ZipDetector on a file should back off to streaming detection on failure to open a zipfile

    [ https://issues.apache.org/jira/browse/TIKA-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507273#comment-17507273 ] 

Luís Filipe Nassif edited comment on TIKA-3701 at 3/16/22, 12:27 AM:
---------------------------------------------------------------------

Thank you [~tallison]! (I was driving to home...)

I'm attaching my triggering file, it can be used for future tests, since it's from govdocs.


was (Author: lfcnassif):
Thank you [~tallison]! (I was driving to home...)

I'm attaching my triggering file, it can be used for tests, since it's from govdocs.

> ZipDetector on a file should back off to streaming detection on failure to open a zipfile
> -----------------------------------------------------------------------------------------
>
>                 Key: TIKA-3701
>                 URL: https://issues.apache.org/jira/browse/TIKA-3701
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 2.3.1
>
>         Attachments: Carved-107429888
>
>
> If a file is passed to Tika wrapped as a TikaInputStream with an underlying file, the DefaultZipDetector tries to open a ZipFile.  If there's a truncated file or if that ZipFile open fails, the DefaultZipDetector effectively gives up.
> Given that there's still a file available, we should try to do a streaming detect by reopening the file as a regular InputStream.
> If we don't do this, we wind up getting different detection for some truncated ooxml if the user sends in a file vs a stream.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)