You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alan Woodward (JIRA)" <ji...@apache.org> on 2018/01/31 16:04:01 UTC

[jira] [Commented] (LUCENE-8145) UnifiedHighlighter should use single OffsetEnum rather than List

    [ https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347059#comment-16347059 ] 

Alan Woodward commented on LUCENE-8145:
---------------------------------------

This patch renames `FieldOffsetStrategy#getOffsetsEnums()` to `FieldOffsetStrategy#getOffsetsEnum`, and changes the return value from `List<OffsetsEnum>` to `OffsetsEnum` directly.

FieldHighlighter is simplified a bit, particularly in terms of handling OffsetsEnum as a closeable resource.  Scoring is delegated to the Passage itself, which now keeps track of the within-passage frequencies of its highlighted terms and phrases.  A new MultiOffsetsEnum class deals with combining multiple OffsetsEnums using a priority queue.  Because all offsets are iterated in order, Passage no longer needs to worry about sorting its internal hits.

The APIs for FieldOffsetStrategy, Passage and OffsetEnum have all changed slightly, but they're all pretty expert so I think this could be targeted at 7.3?

cc [~dsmiley] [~jimczi]

> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8145
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8145
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting: finding highlight offsets, breaking content up into snippets, and passage scoring.  It would be nice to split this up so that consumers can use them separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return a single unified OffsetsEnum, rather than a collection of them.  This will make it easier to expose the OffsetsEnum of a document directly from the highlighter, bypassing snippet extraction and scoring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org