You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2016/11/17 17:13:58 UTC

[jira] [Commented] (LUCENE-7565) UnifiedHighlighter: add ability to delineate passes by max char size

    [ https://issues.apache.org/jira/browse/LUCENE-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674263#comment-15674263 ] 

David Smiley commented on LUCENE-7565:
--------------------------------------

The UH solely breaks according to a java.text.BreakIterator.  Perhaps the most straight-forward way to do this is to add a new BreakIterator.  Other ways would probably require a larger refactoring, esp. considering how multi-valued fields are highlighted with SplittingBreakIterator.  The B.I. abstraction isn't great but it suffices for the highlighter, and it can suffice for this use-case provided this B.I. impl makes some assumptions as to how the UH calls it's methods.  A new BI could wrap a target BI (_that_ BI would typically be a standard "word" impl but needn't be).  When bi.following(offset) is invoked (which is called by the UH at the start of the passage to find the end of the passage), it can examine the current position (the start) and consider the configured character target length and then use the underlying breakIterator, likely calling following() then previous().

I was just thinking... an alternative way to think about delineating passages is by having the highlighted words not exceed X words in-between in a given passage.  That would be an interesting approach.  Quite separate from this issue though!

> UnifiedHighlighter: add ability to delineate passes by max char size
> --------------------------------------------------------------------
>
>                 Key: LUCENE-7565
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7565
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>
> The Highlighter and FastVectorHighlighter can be configured to delineate passages using a target character length, that is then typically adjusted for the word boundary.  This would be a good option to add to the UnifiedHighlighter (UH) in it's own right, as well as for better back-wards compatibility to those using this highlighter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org