You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Geren White <ge...@1stdibs.com> on 2023/02/27 21:24:03 UTC

Reversed leftOuterJoin on clause returns incorrect results

Hello,

When testing out joins in solr streams we noticed that when the on clause
is reversed the results are incorrect and the join will return as if
everything matched.

For example if you have steamA and streamB with the following tuples:

streamA:
{
  item_id_1: "123",
  item_id_2: "456"
}

streamB:
{
  item_id: "789",
  user_id: "0"
}

Executing a stream like below:
leftOuterJoin(
  search(collection-a, q=*:*, fq="item_id_1:123", fl="item_id_1,item_id_2",
qt="/export", sort="item_id_2 desc"),
  search(collection-b,
fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
desc"),
on="item_id=item_id_2")

This will return something like this where all tuples are joined even
though item_id doesn't match item_id_2:
{
  item_id_1: "123",
  item_id_2: "456",
  item_id: "789",
  user_id: "0"
}

Note that the first column in the on clause is from the second table.

Is this expected behavior? We're running solr 8.11.1 and noticed it
while setting up a new query. It's an easy fix to switch the on clause but
seems like it should throw an error or handle it properly. Happy to open up
a bug ticket if this isn't expected.

Thanks,
-- 
*Geren White | Senior Director, Engineering*
*(e)* geren@1stdibs.com

Re: Reversed leftOuterJoin on clause returns incorrect results

Posted by Geren White <ge...@1stdibs.com>.
No worries I appreciate the response, I just got a jira account setup so
I'll open up a bug around it and might take a stab at fixing.

Thanks,

On Tue, Feb 28, 2023 at 8:16 PM t sornin <ts...@gmail.com> wrote:

> Hi Geren,
>
>
>
> Sorry about the initial response, I just looked over your expression and
> didn't read the full context.  I've experienced similar behavior when I've
> deviated (unintentionally) from the documentation.  I think it's worth
> raising a JIRA, SEs can definitely  benefit from better error
> responses/handling.
>
>
>
> Mathew
>
> On Tue, Feb 28, 2023 at 9:37 AM Geren White <ge...@1stdibs.com> wrote:
>
> > Yea sorry the question was if that's a bug and if it should throw an
> error?
> > The results right now are pretty confusing and I could see it leading to
> > some bugs.
> >
> > On Mon, Feb 27, 2023 at 5:23 PM t sornin <ts...@gmail.com> wrote:
> >
> > > Your join key is reversed.  It should be "on=item_id_2=item_id" which
> > only
> > > returns the left stream (first stream param for leftOuterJoin) since
> > there
> > > is no match.
> > >
> > > Hope this helps.
> > >
> > > Mathew
> > >
> > > On Mon, Feb 27, 2023, 4:25 PM Geren White <ge...@1stdibs.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > When testing out joins in solr streams we noticed that when the on
> > clause
> > > > is reversed the results are incorrect and the join will return as if
> > > > everything matched.
> > > >
> > > > For example if you have steamA and streamB with the following tuples:
> > > >
> > > > streamA:
> > > > {
> > > >   item_id_1: "123",
> > > >   item_id_2: "456"
> > > > }
> > > >
> > > > streamB:
> > > > {
> > > >   item_id: "789",
> > > >   user_id: "0"
> > > > }
> > > >
> > > > Executing a stream like below:
> > > > leftOuterJoin(
> > > >   search(collection-a, q=*:*, fq="item_id_1:123",
> > > fl="item_id_1,item_id_2",
> > > > qt="/export", sort="item_id_2 desc"),
> > > >   search(collection-b,
> > > >
> fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
> > > > desc"),
> > > > on="item_id=item_id_2")
> > > >
> > > > This will return something like this where all tuples are joined even
> > > > though item_id doesn't match item_id_2:
> > > > {
> > > >   item_id_1: "123",
> > > >   item_id_2: "456",
> > > >   item_id: "789",
> > > >   user_id: "0"
> > > > }
> > > >
> > > > Note that the first column in the on clause is from the second table.
> > > >
> > > > Is this expected behavior? We're running solr 8.11.1 and noticed it
> > > > while setting up a new query. It's an easy fix to switch the on
> clause
> > > but
> > > > seems like it should throw an error or handle it properly. Happy to
> > open
> > > up
> > > > a bug ticket if this isn't expected.
> > > >
> > > > Thanks,
> > > > --
> > > > *Geren White | Senior Director, Engineering*
> > > > *(e)* geren@1stdibs.com
> > > >
> > >
> >
> >
> > --
> > *Geren White | Senior Director, Engineering*
> > *(e)* geren@1stdibs.com
> >
>


-- 
*Geren White | Senior Director, Engineering*
*(e)* geren@1stdibs.com

Re: Reversed leftOuterJoin on clause returns incorrect results

Posted by t sornin <ts...@gmail.com>.
Hi Geren,



Sorry about the initial response, I just looked over your expression and
didn't read the full context.  I've experienced similar behavior when I've
deviated (unintentionally) from the documentation.  I think it's worth
raising a JIRA, SEs can definitely  benefit from better error
responses/handling.



Mathew

On Tue, Feb 28, 2023 at 9:37 AM Geren White <ge...@1stdibs.com> wrote:

> Yea sorry the question was if that's a bug and if it should throw an error?
> The results right now are pretty confusing and I could see it leading to
> some bugs.
>
> On Mon, Feb 27, 2023 at 5:23 PM t sornin <ts...@gmail.com> wrote:
>
> > Your join key is reversed.  It should be "on=item_id_2=item_id" which
> only
> > returns the left stream (first stream param for leftOuterJoin) since
> there
> > is no match.
> >
> > Hope this helps.
> >
> > Mathew
> >
> > On Mon, Feb 27, 2023, 4:25 PM Geren White <ge...@1stdibs.com> wrote:
> >
> > > Hello,
> > >
> > > When testing out joins in solr streams we noticed that when the on
> clause
> > > is reversed the results are incorrect and the join will return as if
> > > everything matched.
> > >
> > > For example if you have steamA and streamB with the following tuples:
> > >
> > > streamA:
> > > {
> > >   item_id_1: "123",
> > >   item_id_2: "456"
> > > }
> > >
> > > streamB:
> > > {
> > >   item_id: "789",
> > >   user_id: "0"
> > > }
> > >
> > > Executing a stream like below:
> > > leftOuterJoin(
> > >   search(collection-a, q=*:*, fq="item_id_1:123",
> > fl="item_id_1,item_id_2",
> > > qt="/export", sort="item_id_2 desc"),
> > >   search(collection-b,
> > > fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
> > > desc"),
> > > on="item_id=item_id_2")
> > >
> > > This will return something like this where all tuples are joined even
> > > though item_id doesn't match item_id_2:
> > > {
> > >   item_id_1: "123",
> > >   item_id_2: "456",
> > >   item_id: "789",
> > >   user_id: "0"
> > > }
> > >
> > > Note that the first column in the on clause is from the second table.
> > >
> > > Is this expected behavior? We're running solr 8.11.1 and noticed it
> > > while setting up a new query. It's an easy fix to switch the on clause
> > but
> > > seems like it should throw an error or handle it properly. Happy to
> open
> > up
> > > a bug ticket if this isn't expected.
> > >
> > > Thanks,
> > > --
> > > *Geren White | Senior Director, Engineering*
> > > *(e)* geren@1stdibs.com
> > >
> >
>
>
> --
> *Geren White | Senior Director, Engineering*
> *(e)* geren@1stdibs.com
>

Re: Reversed leftOuterJoin on clause returns incorrect results

Posted by Geren White <ge...@1stdibs.com>.
Yea sorry the question was if that's a bug and if it should throw an error?
The results right now are pretty confusing and I could see it leading to
some bugs.

On Mon, Feb 27, 2023 at 5:23 PM t sornin <ts...@gmail.com> wrote:

> Your join key is reversed.  It should be "on=item_id_2=item_id" which only
> returns the left stream (first stream param for leftOuterJoin) since there
> is no match.
>
> Hope this helps.
>
> Mathew
>
> On Mon, Feb 27, 2023, 4:25 PM Geren White <ge...@1stdibs.com> wrote:
>
> > Hello,
> >
> > When testing out joins in solr streams we noticed that when the on clause
> > is reversed the results are incorrect and the join will return as if
> > everything matched.
> >
> > For example if you have steamA and streamB with the following tuples:
> >
> > streamA:
> > {
> >   item_id_1: "123",
> >   item_id_2: "456"
> > }
> >
> > streamB:
> > {
> >   item_id: "789",
> >   user_id: "0"
> > }
> >
> > Executing a stream like below:
> > leftOuterJoin(
> >   search(collection-a, q=*:*, fq="item_id_1:123",
> fl="item_id_1,item_id_2",
> > qt="/export", sort="item_id_2 desc"),
> >   search(collection-b,
> > fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
> > desc"),
> > on="item_id=item_id_2")
> >
> > This will return something like this where all tuples are joined even
> > though item_id doesn't match item_id_2:
> > {
> >   item_id_1: "123",
> >   item_id_2: "456",
> >   item_id: "789",
> >   user_id: "0"
> > }
> >
> > Note that the first column in the on clause is from the second table.
> >
> > Is this expected behavior? We're running solr 8.11.1 and noticed it
> > while setting up a new query. It's an easy fix to switch the on clause
> but
> > seems like it should throw an error or handle it properly. Happy to open
> up
> > a bug ticket if this isn't expected.
> >
> > Thanks,
> > --
> > *Geren White | Senior Director, Engineering*
> > *(e)* geren@1stdibs.com
> >
>


-- 
*Geren White | Senior Director, Engineering*
*(e)* geren@1stdibs.com

Re: Reversed leftOuterJoin on clause returns incorrect results

Posted by t sornin <ts...@gmail.com>.
Your join key is reversed.  It should be "on=item_id_2=item_id" which only
returns the left stream (first stream param for leftOuterJoin) since there
is no match.

Hope this helps.

Mathew

On Mon, Feb 27, 2023, 4:25 PM Geren White <ge...@1stdibs.com> wrote:

> Hello,
>
> When testing out joins in solr streams we noticed that when the on clause
> is reversed the results are incorrect and the join will return as if
> everything matched.
>
> For example if you have steamA and streamB with the following tuples:
>
> streamA:
> {
>   item_id_1: "123",
>   item_id_2: "456"
> }
>
> streamB:
> {
>   item_id: "789",
>   user_id: "0"
> }
>
> Executing a stream like below:
> leftOuterJoin(
>   search(collection-a, q=*:*, fq="item_id_1:123", fl="item_id_1,item_id_2",
> qt="/export", sort="item_id_2 desc"),
>   search(collection-b,
> fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
> desc"),
> on="item_id=item_id_2")
>
> This will return something like this where all tuples are joined even
> though item_id doesn't match item_id_2:
> {
>   item_id_1: "123",
>   item_id_2: "456",
>   item_id: "789",
>   user_id: "0"
> }
>
> Note that the first column in the on clause is from the second table.
>
> Is this expected behavior? We're running solr 8.11.1 and noticed it
> while setting up a new query. It's an easy fix to switch the on clause but
> seems like it should throw an error or handle it properly. Happy to open up
> a bug ticket if this isn't expected.
>
> Thanks,
> --
> *Geren White | Senior Director, Engineering*
> *(e)* geren@1stdibs.com
>