You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Chetan Mehrotra (JIRA)" <ji...@apache.org> on 2015/01/30 13:54:34 UTC

[jira] [Commented] (OAK-2463) Provide support for providing custom Tika config

    [ https://issues.apache.org/jira/browse/OAK-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298593#comment-14298593 ] 

Chetan Mehrotra commented on OAK-2463:
--------------------------------------

Applied the patch in trunk for now with http://svn.apache.org/r1655996. Once review is done would merge it to branch

Custom Tika config xml can be now be provided as part of Index Defintion node by creating a {{nt:file}} node with name {{tikaConfig}} under index definition

{noformat}
/oak:index/assetType
  - jcr:primaryType = "oak:QueryIndexDefinition"
  - compatVersion = 2
  - type = "lucene"
  - async = "async"
  + tikaConfig (nt:file)
     + jcr:content
         - jcr:data = //config xml binary content
  + indexRules
{noformat}

> Provide support for providing custom Tika config
> ------------------------------------------------
>
>                 Key: OAK-2463
>                 URL: https://issues.apache.org/jira/browse/OAK-2463
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: oak-lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.1.6, 1.0.12
>
>         Attachments: OAK-2463.patch
>
>
> Currently the Oak Lucene uses the default Tika Config while extracting text content from binary properties. To provide better control the tika config should be made configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)