You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2020/03/06 16:35:00 UTC

[jira] [Created] (TIKA-3061) Streaming zip container detector stopping short

Tim Allison created TIKA-3061:
---------------------------------

             Summary: Streaming zip container detector stopping short
                 Key: TIKA-3061
                 URL: https://issues.apache.org/jira/browse/TIKA-3061
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


In the recent regression runs in prep for 1.24, I found that in a few cases, an open office document inside of a zip was no longer identified as an open office document, but rather another zip file.

For an unknown reason, the new {{detectStarOfficeX}} is doing something to the ziparchiveinputstream that is causing it to silently fail to iterate through all of the entries in the zip file...or, in short, causing it to stop short.  If we copy the bytes to a byte array and then process them, all is well.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)