You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/05/13 09:32:01 UTC

[jira] [Commented] (OAK-2468) Index binary only if some Tika parser can support the binaries mimeType

    [ https://issues.apache.org/jira/browse/OAK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541522#comment-14541522 ] 

Chetan Mehrotra commented on OAK-2468:
--------------------------------------

Merged to 1.0 branch with http://svn.apache.org/r1679155

> Index binary only if some Tika parser can support the binaries mimeType
> -----------------------------------------------------------------------
>
>                 Key: OAK-2468
>                 URL: https://issues.apache.org/jira/browse/OAK-2468
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.1.8, 1.0.15
>
>
> Currently all binaries are passed to Tika for text extraction. However Tika can only parse those for which it has supported parser present. Therefore extraction logic should parse a binary only if the mimeType is supported by Tika. 
> With this change {{jcr:mimeType}} would become a mandatory property 
> JR2 had a similar check [1]
> [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/NodeIndexer.java#L932



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)