You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Tommaso Teofili (JIRA)" <ji...@apache.org> on 2015/11/06 11:30:27 UTC

[jira] [Comment Edited] (OAK-3580) Make it possible to use indexes for providing excerpts

    [ https://issues.apache.org/jira/browse/OAK-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993470#comment-14993470 ] 

Tommaso Teofili edited comment on OAK-3580 at 11/6/15 10:29 AM:
----------------------------------------------------------------

attached first patch which relies on indexes for generating the _rep:excerpt_ whenever possible.

When retrieving the row value, if _rep:excerpt(.)_ is used the generated value is returned if available, otherwise if it's not available or if some property level excerpt is required, e.g. _rep:excerpt(text)_, the existing {{SimpleExcerptProvider}} is used as a fallback mechanism.

Implementation wise Lucene index uses default Lucene's {{Highlighter}} implementation which relies on field values being stored, however it may be good to switch to {{PostingsHighlighter}} as that is supposed to be faster and doesn't require stored values, but just offsets and positions to be available for indexed terms, see this [blogpost|http://blog.mikemccandless.com/2012/12/a-new-lucene-highlighter-is-born.html].
In Solr the test configuration uses the default highlighter, but that can be changed in solrconfig.xml in order to use fast vector or postings highlighters.


was (Author: teofili):
attached first patch which relies on indexes for generating the _rep:excerpt_ whenever possible.

When retrieving the row value, if _rep:excerpt(.)_ is used the generated value is returned if available, otherwise if it's not available or if some property level excerpt is required, e.g. _rep:excerpt(text)_, the existing {{SimpleExcerptProvider}} is used as a fallback mechanism.

Implementation wise Lucene index uses default Lucene's {{Highlighter}} implementation which relies on field values being stored, however it may be good to switch to {{PostingsHighlighter}} as that is supposed to be faster and doesn't require stored values, but just offsets and positions to be available for indexed terms.
In Solr the test configuration uses the default highlighter, but that can be changed in solrconfig.xml in order to use fast vector or postings highlighters.

> Make it possible to use indexes for providing excerpts
> ------------------------------------------------------
>
>                 Key: OAK-3580
>                 URL: https://issues.apache.org/jira/browse/OAK-3580
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query, solr
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.3.10
>
>         Attachments: OAK-3580.1.patch
>
>
> Currently {{SimpleExcerptProvider}} always provides excerpt, regardless of the underlying index used for the query, this having the limitation of not working with binaries.
> Because of that it'd be good to leverage existing indexes capabilities to use their highlighter implementations to provide excerpt support, also because Lucene and Solr Oak indexes already perform full text extraction from binaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)