You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Tommaso Teofili (JIRA)" <ji...@apache.org> on 2015/03/16 09:36:38 UTC

[jira] [Updated] (OAK-653) Improve binaries handling in Solr index

     [ https://issues.apache.org/jira/browse/OAK-653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tommaso Teofili updated OAK-653:
--------------------------------
    Fix Version/s: 1.2

> Improve binaries handling in Solr index
> ---------------------------------------
>
>                 Key: OAK-653
>                 URL: https://issues.apache.org/jira/browse/OAK-653
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: oak-solr
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.2
>
>
> Solr provides SolrCell (integration with Apache Tika, http://wiki.apache.org/solr/ExtractingRequestHandler) which would be easy to leverage. Also it'd be nice to have that working on the Lucene level as a specific set of analyzers/tokenizers so that it'd be transparent (wouldn't need any special URI for binaries indexing) once those are configured in a Solr schema.
> It'd be also good to be able to extract the text from within the SolrIndexEditor (like LuceneIndexEditor does) without having to rely on SolrCell on the Solr side as it's not always exposed (it depends on wether it's explicitly configured).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)