You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2007/03/28 12:10:32 UTC

[jira] Commented: (JCR-820) Add support for query result highlighting

    [ https://issues.apache.org/jira/browse/JCR-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484794 ] 

Marcel Reutegger commented on JCR-820:
--------------------------------------

Committed initial version: 523251

The query languages now support an excerpt function that returns highlighted fragments for the current node in a result row.

The excerpt is a simple XML fragment. An example fragment could look like this for the query terms 'jackrabbit' and 'query':

<excerpt>
     <fragment>
          <highlight>Jackrabbit</highlight> implements both the mandatory XPath and optional SQL
          <highlight>query</highlight> syntax.
     </fragment>
     <fragment>
          Before parsing the XPath <highlight>query</highlight> in <highlight>Jackrabbit</highlight>,
          the statement is surrounded
     </fragment>
 </excerpt>

Example queries:

//element(nt:resource)[jcr:contains(., 'jackrabbit')]/rep:excerpt(.)

select excerpt(.) from nt:resource where contains(., 'jackrabbit')

Per default the excerpt function returns only simple fragments without highlight elements because additional token offset information needs to be indexed for highlighting. To enable term highlighting a configuration parameter needs to be set:

<param name="supportHighlighting" value="true"/>

Per default this is set to false for performance reasons. When set to true the values of string properties and the text extract of binary properties are stored in the lucene index. Because in lucene all stored fields are loaded when a document is requested this affects performance. With lucene 2.1 this behaviour can be controlled and only specified fields can be loaded. Once jackrabbit switches to lucene 2.1 the query handler should only read stored fulltext extract when really needed.

Similarly when switching to lucene 2.1, jackrabbit should have a custom field implementation that allows to store a field with a reader value. Currently when highlighting is enabled deferred text extraction is effectively disabled. With a custom field implementation deferred text extraction will work again even if highlighting is enabled.

> Add support for query result highlighting
> -----------------------------------------
>
>                 Key: JCR-820
>                 URL: https://issues.apache.org/jira/browse/JCR-820
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: query
>            Reporter: Marcel Reutegger
>            Priority: Minor
>
> Highlighting matches in a query result list is regularly needed for an application. The query languages should support a pseudo property or function that allows one to retrieve text fragments with highlighted matches from the content of the matching node.
> To support this feature the following enhancements are required:
> - define a pseudo property or function that returns the text excerpt and can be used in the select clause
> - the index needs to *store* the original text it used when the node was indexed. this also includes extracted text from binary properties.
> - text fragments must be created based on the original text, the query and index information

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.