You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2017/03/09 11:22:38 UTC

[jira] [Commented] (TIKA-2294) Tika inconsistently detects ooxml files as zip file sometimes

    [ https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902906#comment-15902906 ] 

Nick Burch commented on TIKA-2294:
----------------------------------

To correctly detect the OOXML sub-type, you either need the filename, or the full contents + detector out of the parsers package

See also https://wiki.apache.org/tika/Troubleshooting%20Tika#Content_Incorrectly_Detected

> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
>                 Key: TIKA-2294
>                 URL: https://issues.apache.org/jira/browse/TIKA-2294
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11
>         Environment: linux
>            Reporter: chanchal
>
> Tika sometimes incorrectly detects  ooxml file as zip and sometimes correctly detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)