You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2019/07/03 10:05:00 UTC

[jira] [Resolved] (LUCENE-8902) Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards

     [ https://issues.apache.org/jira/browse/LUCENE-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev resolved LUCENE-8902.
--------------------------------------
    Resolution: Not A Problem

bq. Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior.
# Not at all. Child query matches id00002's children, but since it's absent in parent mask, it lands on the next bit, which it id00003.  
# I don't think child free is supported, although I don't remember why
# not having the last segment doc in parent mask (id=id00005) should cause an exception IIRC.
# please obey jira usage rules, come to mailing list first

> Index-time join ToParentBlockJoinQuery query produces incorrect result with child wildcards
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8902
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8902
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/join
>    Affects Versions: 8.1.1
>            Reporter: Andrei
>            Priority: Major
>
> When I do a index-time join query on certain parent docs with a wildcard query for child docs, sometimes I get the wrong answer. Example:
>  
> ||Parent Doc||Children||
> |id=id00000|      none|
> |id=id00001| # program=P1|
> |id=id00002| # program=P1
>  # program=P2|
> |id=id00003|      none|
> |id=id00004| # program=P1|
> |id=id00005| # program=P1
>  # program=P2|
> So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.
> 1. The following query gives the correct results:
>         BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
>         Query q = new ToParentBlockJoinQuery(new TermInSetQuery("program", toSet("P1", "P2")), parentSet, ScoreMode.None);
> Returns the correct result (4 docs: ["id00001", "id00002", "id00004", "id00005"]
>  
> 2. This also gives correct result (same as above):
>         BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00002", "id00003", "id00004", "id00005")));
>         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);
>  
> 3. Also correct (same as above)
>         BitSetProducer parentSet = new QueryBitSetProducer(new WildcardQuery(new Term("id", "*")));
>         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);
> so far so good.
>  
> 4. This one gives incorrect result:
>         BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet("id00000", "id00001", "id00003")));
>         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);
> Returns 2 docs ["id00001", "id00003"]. It should only return "id00001" and not "id00003" here. Very strange behavior. 
>  
> 5. Just asking for "id00003" also incorrectly returns it:
>         BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
>         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);
>  
> 6. But as soon as I add "id00002" to the parent query, it works again..
>         BitSetProducer parentSet = new QueryBitSetProducer(new TermInSetQuery("id", toSet( "id00003", "id00002")));
>         Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, org.apache.lucene.search.join.ScoreMode.None);
> Gives the correct result ["id00002"]
> ----
> I am attaching the unit test that demonstrates this: [https://pastebin.com/aJ1LDLCS]
> I don't know if I am doing something wrong, or if there is an issue.
> Thank you for looking into it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org