You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2008/05/15 15:33:55 UTC
[jira] Commented: (LUCENE-1285) WeightedSpanTermExtractor
incorrectly treats the same terms occurring in different query types
[ https://issues.apache.org/jira/browse/LUCENE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597130#action_12597130 ]
Mark Miller commented on LUCENE-1285:
-------------------------------------
Nice catch and the fix looks great.
Thanks Andrzej.
> WeightedSpanTermExtractor incorrectly treats the same terms occurring in different query types
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-1285
> URL: https://issues.apache.org/jira/browse/LUCENE-1285
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/highlighter
> Affects Versions: 2.4
> Reporter: Andrzej Bialecki
> Fix For: 2.4
>
> Attachments: highlighter.patch
>
>
> Given a BooleanQuery with multiple clauses, if a term occurs both in a Span / Phrase query, and in a TermQuery, the results of term extraction are unpredictable and depend on the order of clauses. Concequently, the result of highlighting are incorrect.
> Example text: t1 t2 t3 t4 t2
> Example query: t2 t3 "t1 t2"
> Current highlighting: [t1 t2] [t3] t4 t2
> Correct highlighting: [t1 t2] [t3] t4 [t2]
> The problem comes from the fact that we keep a Map<termText, WeightedSpanTerm>, and if the same term occurs in a Phrase or Span query the resulting WeightedSpanTerm will have a positionSensitive=true, whereas terms added from TermQuery have positionSensitive=false. The end result for this particular term will depend on the order in which the clauses are processed.
> My fix is to use a subclass of Map, which on put() always sets the result to the most lax setting, i.e. if we already have a term with positionSensitive=true, and we try to put() a term with positionSensitive=false, we set the result positionSensitive=false, as it will match both cases.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org