You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/03/15 10:10:38 UTC

[jira] [Comment Edited] (OAK-2468) Index binary only if some Tika parser can support the binaries mimeType

    [ https://issues.apache.org/jira/browse/OAK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362306#comment-14362306 ] 

Chetan Mehrotra edited comment on OAK-2468 at 3/15/15 9:10 AM:
---------------------------------------------------------------

Done with http://svn.apache.org/r1666787

Now {{jcr:mimeType}} has not be not null and supported by Tika for the binary content to be index. Note that with this its now assumed that all binary properties in given node are of same mimeType.

JR2 used to restrict indexing only binary content stored under {{jcr:data}} and when {{jcr:mimeType}} is specified. With Oak we index binary property with any name but do enforce that {{jcr:mimeType}} is not null


was (Author: chetanm):
Done with

Now {{jcr:mimeType}} has not be not null and supported by Tika for the binary content to be index. Note that with this its now assumed that all binary properties in given node are of same mimeType.

JR2 used to restrict indexing only binary content stored under {{jcr:data}} and when {{jcr:mimeType}} is specified. With Oak we index binary property with any name but do enforce that {{jcr:mimeType}} is not null

> Index binary only if some Tika parser can support the binaries mimeType
> -----------------------------------------------------------------------
>
>                 Key: OAK-2468
>                 URL: https://issues.apache.org/jira/browse/OAK-2468
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: oak-lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.1.8
>
>
> Currently all binaries are passed to Tika for text extraction. However Tika can only parse those for which it has supported parser present. Therefore extraction logic should parse a binary only if the mimeType is supported by Tika. 
> With this change {{jcr:mimeType}} would become a mandatory property 
> JR2 had a similar check [1]
> [1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/NodeIndexer.java#L932



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)