You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "FengFeng Cheng (Jira)" <ji...@apache.org> on 2022/06/10 02:03:00 UTC

[jira] [Created] (LUCENE-10609) Phrase queries using SpanNearQuery highlight suspected bugs.

FengFeng Cheng created LUCENE-10609:
---------------------------------------

             Summary: Phrase queries using SpanNearQuery highlight suspected bugs.
                 Key: LUCENE-10609
                 URL: https://issues.apache.org/jira/browse/LUCENE-10609
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/highlighter
    Affects Versions: 8.0
            Reporter: FengFeng Cheng


document: Blockchain technology 5G technology VR Technology AI Technology
analyzer: WhitespaceAnalyzer
query: spanNear([spanNear([title:Blockchain, title:technology], 0, true), spanNear([title:VR, title:Technology], 0, true)], 2, true)

 
{code:java}
//query code
SpanQuery termQuery_sub01 = new SpanTermQuery(new Term("title", "Blockchain"));
SpanQuery termQuery_sub02 = new SpanTermQuery(new Term("title", "technology"));
SpanNearQuery spanNearQuery_Sub01 = new SpanNearQuery(new SpanQuery[] { termQuery_sub01, termQuery_sub02 }, 0, true);
SpanQuery termQuery_sub03 = new SpanTermQuery(new Term("title", "VR"));
SpanQuery termQuery_sub04 = new SpanTermQuery(new Term("title", "Technology"));
SpanNearQuery spanNearQuery_Sub02 = new SpanNearQuery(new SpanQuery[] { termQuery_sub03, termQuery_sub04 }, 0, true);
SpanNearQuery spanNearQuery = new SpanNearQuery(new SpanQuery[] { spanNearQuery_Sub01, spanNearQuery_Sub02 }, 2, true); {code}
The query hits the document, but is there a problem with highlighting? 
{code:java}
//highlight code
QueryScorer scorer = new QueryScorer(query);
SimpleHTMLFormatter simpleHtmlFormatter = new SimpleHTMLFormatter("[", "]");
Highlighter highlighter = new Highlighter(simpleHtmlFormatter, scorer);
highlighter.setTextFragmenter(new SimpleFragmenter(100)); {code}
highlight result
[Blockchain] [technology] 5G [technology] [VR] [Technology] AI Technology

 

I think "Blockchain Technology" and "VR Technology" should be highlighted, but the "technology" in "5G Technology" should not be highlighted.

Uh, uh, UH, I'm not sure if it's a bug or if it's designed that way.

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org