You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrei (JIRA)" <ji...@apache.org> on 2019/07/03 12:57:00 UTC
[jira] [Commented] (LUCENE-8902) Index-time join
ToParentBlockJoinQuery query produces incorrect result with child wildcards
[ https://issues.apache.org/jira/browse/LUCENE-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877816#comment-16877816 ]
Andrei commented on LUCENE-8902:
--------------------------------
My apologies. Thank you and I will pot something to the mailing list.
> Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-8902
> URL: https://issues.apache.org/jira/browse/LUCENE-8902
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/join
> Affects Versions: 8.1.1
> Reporter: Andrei
> Priority: Major
>
> When I do a index-time join query on certain parent docs with a wildcard query for child docs, sometimes I get the wrong answer. Example:
>
> ||Parent Doc||Children||
> |id=id00000| none|
> |id=id00001| # program=P1|
> |id=id00002| # program=P1
> # program=P2|
> |id=id00003| none|
> |id=id00004| # program=P1|
> |id=id00005| # program=P1
> # program=P2|
> So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.
> 1. The following query gives the correct results:
> BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
> Query q = new ToParentBlockJoinQuery(new TermInSetQuery("program", toSet("P1", "P2")), parentSet, ScoreMode.None);
> Returns the correct result (4 docs: ["id00001", "id00002", "id00004", "id00005"]
>
> 2. This also gives correct result (same as above):
> BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
> Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);
>
> 3. Also correct (same as above)
> BitSetProducer parentSet = new QueryBitSetProducer(new WildcardQuery(new Term("id", "*")));
> Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);
> so far so good.
>
> 4. This one gives incorrect result:
> BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00003")));
> Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);
> Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior.
>
> 5. Just asking for "id00003" also incorrectly returns it:
> BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
> Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);
>
> 6. But as soon as I add "id00002" to the parent query, it works again..
> BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet( "id00003", "id00002")));
> Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);
> Gives the correct result ["id00002"]
> ----
> I am attaching the unit test that demonstrates this: [https://pastebin.com/aJ1LDLCS]
> I don't know if I am doing something wrong, or if there is an issue.
> Thank you for looking into it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org