You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/06/11 05:20:00 UTC
[jira] [Commented] (LUCENE-8286) UnifiedHighlighter should support the new Weight.matches API for better match accuracy

    [ https://issues.apache.org/jira/browse/LUCENE-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507691#comment-16507691 ] 

David Smiley commented on LUCENE-8286:
--------------------------------------

The first patch here is my working WIP.  Everything compiles and the results are generally reasonable, notwithstanding some known issues already pointed out from my previous comment.  I enabled it by default and then looked to see what tests broke and why:

* TestUnifiedHighlighter: all failures are for the testFieldMatcher methods since the fieldMatcher mechanism doesn't yet work with this (mentioned in prev comment)
* TestUnifiedHighlighterMTQ.testWhichMTQMatched: because MatchesIterator doesn't yet expose which term matched.
* TestUnifiedHighlighterRanking: failed because the scoring isn't the same
* TestUnifiedHighlighterTermVec.testFetchTermVecsOncePerDoc: randomly fails because sometimes the underlying fields don't have a real index.  The UH highlights one field at a time and _that_ field being highlighted will be made to appear as indexed if it wasn't already (e.g. re-analysis into MemoryIndex or TV LeafReader wrapper) but no other fields will be.  I think once a solution to fieldMatcher works, it may solve the situation here.
* TestUnifiedHighlighterStrictPhrases: i haven't reviewed each failure yet but it all seems to be due to the distinction between highlighting words in phrases by themselves or highlighting the phrase span.  All the assertions assume words by themselves.

What's cool is that this wasn't a big change, and it can be intermixed with SpanQueries.  I need to look at the scoring options more -- loss of freq() is a shame.

> UnifiedHighlighter should support the new Weight.matches API for better match accuracy
> --------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8286
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8286
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8286.patch
>
>
> The new Weight.matches() API should allow the UnifiedHighlighter to more accurately highlight some BooleanQuery patterns correctly -- see LUCENE-7903.
> In addition, this API should make the job of highlighting easier, reducing the LOC and related complexities, especially the UH's PhraseHelper.  Note: reducing/removing PhraseHelper is not a near-term goal since Weight.matches is experimental and incomplete, and perhaps we'll discover some gaps in flexibility/functionality.
> This issue should introduce a new UnifiedHighlighter.HighlightFlag enum option for this method of highlighting.   Perhaps call it {{WEIGHT_MATCHES}}?  Longer term it could go away and it'll be implied if you specify enum values for PHRASES & MULTI_TERM_QUERY?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org