You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ANDREI SOLODIN <as...@comcast.net> on 2019/07/03 15:11:22 UTC

Index-time join ToParentBlockJoinQuery query produces incorrect result

Hello, I am trying to understand the requirements for properly using the index-time join. In my use case, I am trying to model a 1-N relationship where parent document could have 0-N child documents. For now I am keeping my data very simple where each child has a single field. So my data right now look like this:


Parent Doc         Children

--------------------------------------
id=id00000
                              none
id=id00001
                              program=P1

id=id00002
                              program=P1
                              program=P2

id=id00003
                              none
id=id00004
                              program=P1

id=id00005
                              program=P1
                              program=P2


So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.


Certain queries are giving me incorrect result. For example:


BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet,  ScoreMode.None);


This returns "id00003", which is unexpected.


I opened a bug (https://issues.apache.org/jira/browse/LUCENE-8902) in my haste earlier (sorry) and it was mentioned in there that "chid free is not supported". So I take it to mean that each parent should have at least one child. So let's say I add a "default" child to each parent:


Parent Doc         Children

--------------------------------------
id=id00000
                              field1=val1
id=id00001

                              field1=val1
                              program=P1

id=id00002
                              field1=val1

                              program=P1
                              program=P2

id=id00003
                              field1=val1

id=id00004
                              field1=val1

                              program=P1

id=id00005
                              field1=val1

                              program=P1
                              program=P2


So now every parent has at least one child. That made no difference, still get the same result. What am I doing wrong here?


Thanks

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Posted by Mikhail Khludnev <mk...@apache.org>.
Andrei, it's not clear what's the problem, but if you need to join children
to parents and then select only subset of parents you need to combine join
with parent filter. Some cases are explained
https://lucene.apache.org/solr/guide/8_0/other-parsers.html#OtherParsers-BlockJoinParentQueryParser
.

On Sat, Jul 6, 2019 at 1:41 AM ANDREI SOLODIN <as...@comcast.net> wrote:

> So you are implying that the parent filter allows subsets. The code at
> https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java#L46
> implies that subset is not allowed. If I select a subset and invoke the
> checker, I get this IllegalStateException.
>
>
> > On July 3, 2019 at 2:33 PM Michael Sokolov <ms...@gmail.com> wrote:
> >
> >
> > Well for one thing, you might have other documents in the index that
> > are neither parents nor children (in this particular relation). Also,
> > consider a nested hierarchy - how can we automatically figure out
> > which "generation" or "level" of parent to select?
> >
> > On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <as...@comcast.net>
> wrote:
> > >
> > > After looking through the unit tests, I got it working. The problem
> was that I thought the parent filter in the ToParentBlockJoinQuery can be
> used to select a subset of parents. It appears that the parent filter must
> select ALL parents, not a subset. This is not explained in the javadoc. If
> you want to select a subset of parents (independently of the child query),
> ToParentBlockJoinQuery can not be used on its own, but rather as a clause
> in another query.
> > >
> > > It would be a nice enhancement to just automatically select all
> parents, I mean, it is already required to be the last document in the
> block, why do we need to provide a query for them?
> > >
> > > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <as...@comcast.net>
> wrote:
> > > >
> > > >
> > > >     Thanks Mikhail.
> > > >
> > > >
> > > >     I read through the javadoc and thought I was satisfying all the
> preconditions. Obviously not :-) Is it this part that am I getting wrong:
> "At search time you provide a Filter identifying the parents, however this
> Filter must provide an BitSet
> https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true
> per sub-reader."? If so, given the data above how do I properly create a
> parent query?
> > > >
> > > >
> > > >         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev <
> mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > > > >
> > > > >
> > > > >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <
> asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > > > >
> > > > >         >
> > > > >
> > > > >             > > > This returns "id00003", which is unexpected.
> > > > > >
> > > > > >         > >
> > > > >             > > > Please check ToPBJQ javadoc. It's absolutely
> expected.
> > > > > >
> > > > > >         > >         --
> > > > >         Sincerely yours
> > > > >         Mikhail Khludnev
> > > > >
> > > > >     >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Posted by ANDREI SOLODIN <as...@comcast.net>.
So you are implying that the parent filter allows subsets. The code at https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java#L46 implies that subset is not allowed. If I select a subset and invoke the checker, I get this IllegalStateException.


> On July 3, 2019 at 2:33 PM Michael Sokolov <ms...@gmail.com> wrote:
> 
> 
> Well for one thing, you might have other documents in the index that
> are neither parents nor children (in this particular relation). Also,
> consider a nested hierarchy - how can we automatically figure out
> which "generation" or "level" of parent to select?
> 
> On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <as...@comcast.net> wrote:
> >
> > After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.
> >
> > It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?
> >
> > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <as...@comcast.net> wrote:
> > >
> > >
> > >     Thanks Mikhail.
> > >
> > >
> > >     I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> > >
> > >
> > >         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > > >
> > > >
> > > >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > > >
> > > >         >
> > > >
> > > >             > > > This returns "id00003", which is unexpected.
> > > > >
> > > > >         > >
> > > >             > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > > >
> > > > >         > >         --
> > > >         Sincerely yours
> > > >         Mikhail Khludnev
> > > >
> > > >     >
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Posted by Michael Sokolov <ms...@gmail.com>.
Well for one thing, you might have other documents in the index that
are neither parents nor children (in this particular relation). Also,
consider a nested hierarchy - how can we automatically figure out
which "generation" or "level" of parent to select?

On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <as...@comcast.net> wrote:
>
> After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.
>
> It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?
>
> > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <as...@comcast.net> wrote:
> >
> >
> >     Thanks Mikhail.
> >
> >
> >     I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> >
> >
> >         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > >
> > >
> > >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > >
> > >         >
> > >
> > >             > > > This returns "id00003", which is unexpected.
> > > >
> > > >         > >
> > >             > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > >
> > > >         > >         --
> > >         Sincerely yours
> > >         Mikhail Khludnev
> > >
> > >     >
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Posted by ANDREI SOLODIN <as...@comcast.net>.
After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.

It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?

> On July 3, 2019 at 10:52 AM ANDREI SOLODIN <as...@comcast.net> wrote:
> 
> 
>     Thanks Mikhail.
> 
> 
>     I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> 
> 
>         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > 
> > 
> >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > 
> >         >
> > 
> >             > > > This returns "id00003", which is unexpected.
> > > 
> > >         > > 
> >             > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > 
> > >         > >         --
> >         Sincerely yours
> >         Mikhail Khludnev
> > 
> >     > 


 

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Posted by ANDREI SOLODIN <as...@comcast.net>.
Thanks Mikhail.


I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?


> On July 3, 2019 at 10:30 AM Mikhail Khludnev <mkhl@apache.org mailto:mkhl@apache.org > wrote:
> 
> 
>     On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> 
>     >
> 
>         > > This returns "id00003", which is unexpected.
> > 
> >     > 
>         > > Please check ToPBJQ javadoc. It's absolutely expected.
> > 
> >     >     --
>     Sincerely yours
>     Mikhail Khludnev
> 

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Posted by Mikhail Khludnev <mk...@apache.org>.
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <as...@comcast.net> wrote:

>
> This returns "id00003", which is unexpected.
>
> Please check ToPBJQ javadoc. It's absolutely expected.

-- 
Sincerely yours
Mikhail Khludnev