You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christian Spitzlay <ch...@biologis.com> on 2018/06/06 08:18:24 UTC

Streaming Expression intersect() behaviour

Hi,

I don’t seem to get the behaviour of the intersect() stream decorator.
I only ever get one doc from the left stream when I would have expected 
more than one.

I constructed a test case that does not depend on my concrete index:

intersect(
cartesianProduct(tuple(fieldA=array(c,c,a,b,d,d)), fieldA, productSort="fieldA asc"),
cartesianProduct(tuple(fieldB=array(c,c,a,d,d)), fieldB, productSort="fieldB asc"),
on="fieldA=fieldB“
)


The result:

{
  "result-set": {
    "docs": [
      {
        "fieldA": "a"
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 0
      }
    ]
  }
}


I would have expected all the docs from the left stream with fieldA values a, c, d
and only the docs with fieldA == b missing.  Do I have a fundamental misunderstanding?


Best regards
Christian Spitzlay



Re: Streaming Expression intersect() behaviour

Posted by Joel Bernstein <jo...@gmail.com>.
yes, I was going to suggest that as well.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 8, 2018 at 9:20 AM, Christian Spitzlay <
christian.spitzlay@biologis.com> wrote:

> As a temporary workaround until that issue is fixed
> one could wrap the right stream with a select that renames the field:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA asc"),
> select(cartesianProduct(tuple(fieldB=array(a,c)), fieldB,
> productSort="fieldB asc"), fieldB as fieldA),
> on=fieldA
> )
>
>
>
> > Am 08.06.2018 um 14:42 schrieb Joel Bernstein <jo...@gmail.com>:
> >
> > You're correct, after testing again the only way that this works
> correctly
> > appears to be:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA
> > asc"),
> > cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
> > asc"),
> > on="fieldA"
> > )
> >
> > I suspect that there are only test cases that cover this scenario as
> well.
> > I'll create a jira issue for this.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Jun 8, 2018 at 3:41 AM, Christian Spitzlay <
> > christian.spitzlay@biologis.com> wrote:
> >
> >> Hi,
> >>
> >>
> >>> Am 08.06.2018 um 03:42 schrieb Joel Bernstein <jo...@gmail.com>:
> >>>
> >>> And when you transpose the "on" fields like this:
> >>>
> >>> intersect(
> >>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> >> productSort="fieldA
> >>> asc"),
> >>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> >>> asc"),
> >>> on="fieldB=fieldA"
> >>> )
> >>>
> >>> It also works.
> >>
> >>
> >> No, IIUC this does not work correctly.
> >>
> >> I had tried this before posting my original question.
> >> That version emits the documents from the left stream
> >> but does not filter out the document with fieldA == b.
> >>
> >> This might be due to the fact that fieldB is not present in the left
> stream
> >> and fieldA is not present in the right stream; it compares two
> >> empty values (null?) and comes to the conclusion that they are equal.
> >> Could that be the reason?
> >>
> >>
> >>
> >>> So, yes there is a bug where the fields are being transposed with
> >> intersect
> >>> function's "on" fields. The same issue was happening with joins and may
> >>> have been resolved. I'll do little more research into this.
> >>
> >> Thanks for your work on this!
> >>
> >>
> >> Best regards
> >> Christian Spitzlay
> >>
> >>
> >>
> >>
> >>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/
> >>>
> >>> On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
> >>> christian.spitzlay@biologis.com> wrote:
> >>>
> >>>>
> >>>>
> >>>>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
> >>>> christian.spitzlay@biologis.com>:
> >>>>>
> >>>>> intersect(
> >>>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> >>>> productSort="fieldA asc"),
> >>>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB,
> productSort="fieldB
> >>>> asc"),
> >>>>> on="fieldA=fieldB"
> >>>>> )
> >>>>>
> >>>>> I simplified it a bit, too. I still get one document with fieldA ==
> a.
> >>>>> I would have expected three documents in the output, one with fieldA
> ==
> >>>> a and two with fieldB == c.
> >>>>
> >>>> That should have read „… and two with fieldA == c“ of course.
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Streaming Expression intersect() behaviour

Posted by Christian Spitzlay <ch...@biologis.com>.
As a temporary workaround until that issue is fixed 
one could wrap the right stream with a select that renames the field:

intersect(
cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),
select(cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"), fieldB as fieldA),
on=fieldA
)



> Am 08.06.2018 um 14:42 schrieb Joel Bernstein <jo...@gmail.com>:
> 
> You're correct, after testing again the only way that this works correctly
> appears to be:
> 
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
> asc"),
> cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
> asc"),
> on="fieldA"
> )
> 
> I suspect that there are only test cases that cover this scenario as well.
> I'll create a jira issue for this.
> 
> 
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Fri, Jun 8, 2018 at 3:41 AM, Christian Spitzlay <
> christian.spitzlay@biologis.com> wrote:
> 
>> Hi,
>> 
>> 
>>> Am 08.06.2018 um 03:42 schrieb Joel Bernstein <jo...@gmail.com>:
>>> 
>>> And when you transpose the "on" fields like this:
>>> 
>>> intersect(
>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
>> productSort="fieldA
>>> asc"),
>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
>>> asc"),
>>> on="fieldB=fieldA"
>>> )
>>> 
>>> It also works.
>> 
>> 
>> No, IIUC this does not work correctly.
>> 
>> I had tried this before posting my original question.
>> That version emits the documents from the left stream
>> but does not filter out the document with fieldA == b.
>> 
>> This might be due to the fact that fieldB is not present in the left stream
>> and fieldA is not present in the right stream; it compares two
>> empty values (null?) and comes to the conclusion that they are equal.
>> Could that be the reason?
>> 
>> 
>> 
>>> So, yes there is a bug where the fields are being transposed with
>> intersect
>>> function's "on" fields. The same issue was happening with joins and may
>>> have been resolved. I'll do little more research into this.
>> 
>> Thanks for your work on this!
>> 
>> 
>> Best regards
>> Christian Spitzlay
>> 
>> 
>> 
>> 
>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
>>> On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
>>> christian.spitzlay@biologis.com> wrote:
>>> 
>>>> 
>>>> 
>>>>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
>>>> christian.spitzlay@biologis.com>:
>>>>> 
>>>>> intersect(
>>>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
>>>> productSort="fieldA asc"),
>>>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
>>>> asc"),
>>>>> on="fieldA=fieldB"
>>>>> )
>>>>> 
>>>>> I simplified it a bit, too. I still get one document with fieldA == a.
>>>>> I would have expected three documents in the output, one with fieldA ==
>>>> a and two with fieldB == c.
>>>> 
>>>> That should have read „… and two with fieldA == c“ of course.
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Streaming Expression intersect() behaviour

Posted by Joel Bernstein <jo...@gmail.com>.
You're correct, after testing again the only way that this works correctly
appears to be:

intersect(
 cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
asc"),
 cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
asc"),
 on="fieldA"
 )

I suspect that there are only test cases that cover this scenario as well.
I'll create a jira issue for this.




Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 8, 2018 at 3:41 AM, Christian Spitzlay <
christian.spitzlay@biologis.com> wrote:

> Hi,
>
>
> > Am 08.06.2018 um 03:42 schrieb Joel Bernstein <jo...@gmail.com>:
> >
> > And when you transpose the "on" fields like this:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA
> > asc"),
> > cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> > asc"),
> > on="fieldB=fieldA"
> > )
> >
> > It also works.
>
>
> No, IIUC this does not work correctly.
>
> I had tried this before posting my original question.
> That version emits the documents from the left stream
> but does not filter out the document with fieldA == b.
>
> This might be due to the fact that fieldB is not present in the left stream
> and fieldA is not present in the right stream; it compares two
> empty values (null?) and comes to the conclusion that they are equal.
> Could that be the reason?
>
>
>
> > So, yes there is a bug where the fields are being transposed with
> intersect
> > function's "on" fields. The same issue was happening with joins and may
> > have been resolved. I'll do little more research into this.
>
> Thanks for your work on this!
>
>
> Best regards
> Christian Spitzlay
>
>
>
>
>
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
> > christian.spitzlay@biologis.com> wrote:
> >
> >>
> >>
> >>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
> >> christian.spitzlay@biologis.com>:
> >>>
> >>> intersect(
> >>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> >> productSort="fieldA asc"),
> >>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> >> asc"),
> >>> on="fieldA=fieldB"
> >>> )
> >>>
> >>> I simplified it a bit, too. I still get one document with fieldA == a.
> >>> I would have expected three documents in the output, one with fieldA ==
> >> a and two with fieldB == c.
> >>
> >> That should have read „… and two with fieldA == c“ of course.
> >>
> >>
> >>
> >>
>
>

Re: Streaming Expression intersect() behaviour

Posted by Christian Spitzlay <ch...@biologis.com>.
Hi,


> Am 08.06.2018 um 03:42 schrieb Joel Bernstein <jo...@gmail.com>:
> 
> And when you transpose the "on" fields like this:
> 
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
> asc"),
> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> asc"),
> on="fieldB=fieldA"
> )
> 
> It also works.


No, IIUC this does not work correctly. 

I had tried this before posting my original question.
That version emits the documents from the left stream
but does not filter out the document with fieldA == b.

This might be due to the fact that fieldB is not present in the left stream
and fieldA is not present in the right stream; it compares two
empty values (null?) and comes to the conclusion that they are equal.
Could that be the reason?



> So, yes there is a bug where the fields are being transposed with intersect
> function's "on" fields. The same issue was happening with joins and may
> have been resolved. I'll do little more research into this.

Thanks for your work on this!


Best regards
Christian Spitzlay





> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
> christian.spitzlay@biologis.com> wrote:
> 
>> 
>> 
>>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
>> christian.spitzlay@biologis.com>:
>>> 
>>> intersect(
>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
>> productSort="fieldA asc"),
>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
>> asc"),
>>> on="fieldA=fieldB"
>>> )
>>> 
>>> I simplified it a bit, too. I still get one document with fieldA == a.
>>> I would have expected three documents in the output, one with fieldA ==
>> a and two with fieldB == c.
>> 
>> That should have read „… and two with fieldA == c“ of course.
>> 
>> 
>> 
>> 


Re: Streaming Expression intersect() behaviour

Posted by Joel Bernstein <jo...@gmail.com>.
This expression works as expected:

intersect(
cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
asc"),
cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
asc"),
on="fieldA"
)

And when you transpose the "on" fields like this:

intersect(
 cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
asc"),
 cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
asc"),
 on="fieldB=fieldA"
 )

It also works.

So, yes there is a bug where the fields are being transposed with intersect
function's "on" fields. The same issue was happening with joins and may
have been resolved. I'll do little more research into this.







Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
christian.spitzlay@biologis.com> wrote:

>
>
> > Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
> christian.spitzlay@biologis.com>:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA asc"),
> > cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> asc"),
> > on="fieldA=fieldB"
> > )
> >
> > I simplified it a bit, too. I still get one document with fieldA == a.
> > I would have expected three documents in the output, one with fieldA ==
> a and two with fieldB == c.
>
> That should have read „… and two with fieldA == c“ of course.
>
>
>
>

Re: Streaming Expression intersect() behaviour

Posted by Christian Spitzlay <ch...@biologis.com>.

> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <ch...@biologis.com>:
> 
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),
> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"),
> on="fieldA=fieldB"
> )
> 
> I simplified it a bit, too. I still get one document with fieldA == a.
> I would have expected three documents in the output, one with fieldA == a and two with fieldB == c.

That should have read „… and two with fieldA == c“ of course.




Re: Streaming Expression intersect() behaviour

Posted by Joel Bernstein <jo...@gmail.com>.
Nice example!

I'll take a look at this today. I believe there was/is a bug with the some
of the joins where the "on" parameter is transposing the fields. Its
possible that is the case here as well.



Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 7, 2018 at 5:34 AM, Christian Spitzlay <
christian.spitzlay@biologis.com> wrote:

> Hi,
>
> I noticed that my mail program broke the test case by replacing a double
> quote with a different UTF-8 character.
>
> Here is the test case again and I hope it will work this time:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA asc"),
> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> asc"),
> on="fieldA=fieldB"
> )
>
> I simplified it a bit, too. I still get one document with fieldA == a.
> I would have expected three documents in the output, one with fieldA == a
> and two with fieldB == c.
> Did I misunderstand the docs of the intersect decorator or have I come
> across a bug?
>
>
> Best regards,
> Christian Spitzlay
>
>
>
> > Am 06.06.2018 um 10:18 schrieb Christian Spitzlay <
> christian.spitzlay@biologis.com>:
> >
> > Hi,
> >
> > I don’t seem to get the behaviour of the intersect() stream decorator.
> > I only ever get one doc from the left stream when I would have expected
> > more than one.
> >
> > I constructed a test case that does not depend on my concrete index:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(c,c,a,b,d,d)), fieldA,
> productSort="fieldA asc"),
> > cartesianProduct(tuple(fieldB=array(c,c,a,d,d)), fieldB,
> productSort="fieldB asc"),
> > on="fieldA=fieldB“
> > )
> >
> >
> > The result:
> >
> > {
> >  "result-set": {
> >    "docs": [
> >      {
> >        "fieldA": "a"
> >      },
> >      {
> >        "EOF": true,
> >        "RESPONSE_TIME": 0
> >      }
> >    ]
> >  }
> > }
> >
> >
> > I would have expected all the docs from the left stream with fieldA
> values a, c, d
> > and only the docs with fieldA == b missing.  Do I have a fundamental
> misunderstanding?
> >
> >
> > Best regards
> > Christian Spitzlay
> >
> >
>
>

Re: Streaming Expression intersect() behaviour

Posted by Christian Spitzlay <ch...@biologis.com>.
Hi,

I noticed that my mail program broke the test case by replacing a double
quote with a different UTF-8 character.

Here is the test case again and I hope it will work this time:

intersect(
cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),
cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"),
on="fieldA=fieldB"
)

I simplified it a bit, too. I still get one document with fieldA == a.
I would have expected three documents in the output, one with fieldA == a and two with fieldB == c.
Did I misunderstand the docs of the intersect decorator or have I come across a bug?


Best regards,
Christian Spitzlay



> Am 06.06.2018 um 10:18 schrieb Christian Spitzlay <ch...@biologis.com>:
> 
> Hi,
> 
> I don’t seem to get the behaviour of the intersect() stream decorator.
> I only ever get one doc from the left stream when I would have expected 
> more than one.
> 
> I constructed a test case that does not depend on my concrete index:
> 
> intersect(
> cartesianProduct(tuple(fieldA=array(c,c,a,b,d,d)), fieldA, productSort="fieldA asc"),
> cartesianProduct(tuple(fieldB=array(c,c,a,d,d)), fieldB, productSort="fieldB asc"),
> on="fieldA=fieldB“
> )
> 
> 
> The result:
> 
> {
>  "result-set": {
>    "docs": [
>      {
>        "fieldA": "a"
>      },
>      {
>        "EOF": true,
>        "RESPONSE_TIME": 0
>      }
>    ]
>  }
> }
> 
> 
> I would have expected all the docs from the left stream with fieldA values a, c, d
> and only the docs with fieldA == b missing.  Do I have a fundamental misunderstanding?
> 
> 
> Best regards
> Christian Spitzlay
> 
>