You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrei (JIRA)" <ji...@apache.org> on 2019/07/02 20:57:00 UTC

[jira] [Created] (LUCENE-8902) Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards

Andrei created LUCENE-8902:
------------------------------

             Summary: Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards
                 Key: LUCENE-8902
                 URL: https://issues.apache.org/jira/browse/LUCENE-8902
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/join
    Affects Versions: 8.1.1
            Reporter: Andrei


When I do a index-time join query on certain parent docs with a wildcard query for child docs, sometimes I get the wrong answer. Example:

 
||Parent Doc||Children||
|id=id00000|      none|
|id=id00001| # program=P1|
|id=id00002| # program=P1
 # program=P2|
|id=id00003|      none|
|id=id00004| # program=P1|
|id=id00005| # program=P1
 # program=P2|

So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.

1. The following query gives the correct results:

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
        Query q = new ToParentBlockJoinQuery(new TermInSetQuery("program", toSet("P1", "P2")), parentSet, ScoreMode.None);

Returns the correct result (4 docs: ["id00001", "id00002", "id00004", "id00005"]

 

2. This also gives correct result (same as above):

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
        Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

 

3. Also correct (same as above)

        BitSetProducer parentSet = new QueryBitSetProducer(new WildcardQuery(new Term("id", "*")));
        Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);

so far so good.

 

4. This one gives incorrect result:

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00003")));
        Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior. 

 

5. Just asking for "id00003" also incorrectly returns it:

        BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
        Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

 

6. But as soon as I add "id00002" to the parent query, it works again..

        BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet( "id00003", "id00002")));
        Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);

Gives the correct result ["id00002"]
----
I am attaching the unit test that demonstrates this: [https://pastebin.com/aJ1LDLCS]

I don't know if I am doing something wrong, or if there is an issue.

Thank you for looking into it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org