You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/07/04 05:04:00 UTC

[jira] [Created] (OAK-6414) Use Tika config to determine non indexed mimeTypes

Chetan Mehrotra created OAK-6414:
------------------------------------

             Summary: Use Tika config to determine non indexed mimeTypes
                 Key: OAK-6414
                 URL: https://issues.apache.org/jira/browse/OAK-6414
             Project: Jackrabbit Oak
          Issue Type: Technical task
          Components: lucene
            Reporter: Chetan Mehrotra
            Assignee: Chetan Mehrotra
             Fix For: 1.8


With OAK-2895 support was added to avoid loading of binary content whose mimeType have been excluded from indexing via configuring EmptyParser against them. That approach used a lazyInputStream and relied on the fact that Tika would not access the stream if none of the parser is going to touch that file.

However as seen while upgrading to Tika 1.15 now Tika would [check that the InputStream support marking or not|https://github.com/apache/tika/commit/896c46a0c652de436da0e4f25bfa53a7d83ae02f]. 

To support this change we need to change the logic on Oak side to explicit check by reading tika-config.xml to see which all mimeType have been configured with EmptyParser



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)