You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2017/07/04 05:04:00 UTC
[jira] [Created] (OAK-6414) Use Tika config to determine non
indexed mimeTypes
Chetan Mehrotra created OAK-6414:
------------------------------------
Summary: Use Tika config to determine non indexed mimeTypes
Key: OAK-6414
URL: https://issues.apache.org/jira/browse/OAK-6414
Project: Jackrabbit Oak
Issue Type: Technical task
Components: lucene
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Fix For: 1.8
With OAK-2895 support was added to avoid loading of binary content whose mimeType have been excluded from indexing via configuring EmptyParser against them. That approach used a lazyInputStream and relied on the fact that Tika would not access the stream if none of the parser is going to touch that file.
However as seen while upgrading to Tika 1.15 now Tika would [check that the InputStream support marking or not|https://github.com/apache/tika/commit/896c46a0c652de436da0e4f25bfa53a7d83ae02f].
To support this change we need to change the logic on Oak side to explicit check by reading tika-config.xml to see which all mimeType have been configured with EmptyParser
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)