You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2017/05/22 03:29:04 UTC

[jira] [Commented] (SOLR-4722) Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.

    [ https://issues.apache.org/jira/browse/SOLR-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019114#comment-16019114 ] 

David Smiley commented on SOLR-4722:
------------------------------------

BTW to anyone wishing to do this with the UnifiedSolrHighlighter (Solr 6.4+), you override getBreakIterator to always return WholeBreakIterator, and return getFormatter to return a custom PassageFormatter that encodes the data in Passage however you want.  If you return something other than a String (e.g. Solr NamedList which seems likely) then you'll need to also override UnifiedSolrHighlighter.doHighlighting to _not_ call highlighter.highlightFields; instead call highlightFieldsAsObjects.

This will ultimately end up loading the stored value from Solr and it'll insist it be able to do so.  However it won't be used if your PassageFormatter doesn't do anything with it and if the offsets are already in term vectors or postings.  In this circumstance, you _may_ be able to get away with not loading the stored value by overriding loadFieldValues to return empty strings.  However I think there are some assumptions in FieldHighlighter.highlightOffsetsEnums in which it gets the content length via {{breakIterator.getText().getEndIndex();}}, and it furthermore stops looping once a retrieved offset exceeds that value.  So you probably can't simply use an empty string substitute.  But you nonetheless might be able to hack up a workaround such as via a custom variation of WholeBreakIterator that lies about the end index.  It'd be nice if working around this were a bit easier but it's very advanced.

> Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4722
>                 URL: https://issues.apache.org/jira/browse/SOLR-4722
>             Project: Solr
>          Issue Type: New Feature
>          Components: highlighter
>    Affects Versions: 4.3, 6.0
>            Reporter: Tricia Jenkins
>            Priority: Minor
>         Attachments: PositionsSolrHighlighter.java, SOLR-4722.patch, SOLR-4722.patch, solr-positionshighlighter.jar
>
>
> As an alternative to returning snippets, this highlighter provides the (term) position for query matches.  One usecase for this is to reconcile the term position from the Solr index with 'word' coordinates provided by an OCR process.  In this way we are able to 'highlight' an image, like a page from a book or an article from a newspaper, in the locations that match the user's query.
> This is based on the FastVectorHighlighter and requires that termVectors, termOffsets and termPositions be stored.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org