You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/02/18 19:50:11 UTC
[jira] [Commented] (LUCENE-6255) PhraseQuery inconsistencies
[ https://issues.apache.org/jira/browse/LUCENE-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326371#comment-14326371 ]
Adrien Grand commented on LUCENE-6255:
--------------------------------------
I am not even sure what the behaviour should be for sloppy phrases if we decide on the second option. And I'm concerned it might make the implementation more complicated and/or slower.
> PhraseQuery inconsistencies
> ---------------------------
>
> Key: LUCENE-6255
> URL: https://issues.apache.org/jira/browse/LUCENE-6255
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Adrien Grand
> Assignee: Adrien Grand
>
> PhraseQuery behaves quite inconsistently when the position of the first term is greater than 0. Here is an example:
> {noformat}
> Directory dir = newDirectory();
> RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
> FieldType customType = new FieldType(TextField.TYPE_NOT_STORED);
> customType.setOmitNorms(true);
> Field f = new Field("body", "", customType);
> Document doc = new Document();
> doc.add(f);
> f.setStringValue("one quick fox");
> iw.addDocument(doc);
> IndexReader ir = iw.getReader();
> iw.close();
> IndexSearcher is = newSearcher(ir);
>
> PhraseQuery pq = new PhraseQuery();
> pq.add(new Term("body", "quick"), 0);
> pq.add(new Term("body", "fox"), 1);
> System.out.println(is.search(pq, 1).totalHits); // 1
> pq = new PhraseQuery();
> pq.add(new Term("body", "quick"), 10);
> pq.add(new Term("body", "fox"), 11);
> System.out.println(is.search(pq, 1).totalHits); // 0
>
> pq = new PhraseQuery();
> pq.add(new Term("body", "quick"), 10);
> System.out.println(is.search(pq, 1).totalHits); // 1
>
> pq = new PhraseQuery();
> pq.add(new Term("body", "quick"), 10);
> pq.add(new Term("body", "fox"), 11);
> pq.setSlop(1);
> System.out.println(is.search(pq, 1).totalHits); // 1
>
> ir.close();
> dir.close();
> {noformat}
> The reason is that when you add a term with position P on a PhraseQuery, ExactPhraseScorer ignores all positions for this term which are less than P.
> But this is inconsistent:
> - if you have a single term, it does not work anymore since we rewrite to a term query regardless of the position of the term (3rd query)
> - if you increase the slop, we will use SloppyPhraseScorer which does not have this behaviour. (4th query)
> So I think we have two options:
> - either remove this behaviour and make the positions that are provided to PhraseQuery only relative (ie. fix ExactPhraseScorer)
> - or make it work this way across the board (which means not rewriting to a term query when the position is not 0 and fixing SloppyPhraseScorer).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org