You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/03/15 10:10:38 UTC

[jira] [Resolved] (OAK-2468) Index binary only if some Tika parser can support the binaries mimeType

     [ https://issues.apache.org/jira/browse/OAK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chetan Mehrotra resolved OAK-2468.
----------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 1.2)
                   1.1.8

Done with

Now {{jcr:mimeType}} has not be not null and supported by Tika for the binary content to be index. Note that with this its now assumed that all binary properties in given node are of same mimeType.

JR2 used to restrict indexing only binary content stored under {{jcr:data}} and when {{jcr:mimeType}} is specified. With Oak we index binary property with any name but do enforce that {{jcr:mimeType}} is not null

> Index binary only if some Tika parser can support the binaries mimeType
> -----------------------------------------------------------------------
>
>                 Key: OAK-2468
>                 URL: https://issues.apache.org/jira/browse/OAK-2468
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: oak-lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.1.8
>
>
> Currently all binaries are passed to Tika for text extraction. However Tika can only parse those for which it has supported parser present. Therefore extraction logic should parse a binary only if the mimeType is supported by Tika. 
> With this change {{jcr:mimeType}} would become a mandatory property 
> JR2 had a similar check [1]
> [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/NodeIndexer.java#L932



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)