You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Chris Baynes <ch...@contiamo.com> on 2016/05/26 16:45:49 UTC

Using RelBuilder to construct an inner join

I'm joining datasets from different sources (using the newly implemented
qualified scan), however the following INNER join query returns many more
rows than I would expect (it returns all combinations of rows as an OUTER
join would):

builder.scan("source1", "article_facts")
    .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1, 0,
"property_id"), builder.literal(5)))
    .project(builder.field(1, 0, "article_id"))
  .scan("source2", "articles")
    .project(builder.field(1, 0, "id"))
  .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
      builder.field(2, 0, "article_id"),
      builder.field(2, 1, "id")))
  .build()

The plan output appears correct:

LogicalJoin(condition=[=($0, $0)], joinType=[inner])
    LogicalProject(article_id=[$0])
      LogicalFilter(condition=[=($1, 5)])
        LogicalTableScan(table=[[source1, article_facts]])
    LogicalProject(id=[$0])
      LogicalTableScan(table=[[source2, articles]])

I have tried reproducing this as a test case in RelBuilderTest, but if I
call executeQuery on a statement containing a join I get:

Internal error: Error while applying rule EnumerableJoinRule, args
[rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
$0),joinType=inner)]

I presume this is due to some limitation of the test environment, so right
now I'm unsure how to get this to work.

One more thing I noticed is that the filter predicate (== 5) is not pushed
down to the database (Postgres in this case). Instead calcite used `select
* from article_facts` and applied the filter afterwards. Is that expected
behaviour for the RelBuilder?

Thanks!

Chris

Re: Using RelBuilder to construct an inner join

Posted by Julian Hyde <jh...@apache.org>.
I’ve checked in a fix for 1266. In addition to applications directly using RelBuilder, the bug affects at least one planner rule: SubQueryRemoveRule uses that API internally. So, it was a good find!

Julian


> On May 31, 2016, at 10:46 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> You’re right. I’ve logged https://issues.apache.org/jira/browse/CALCITE-1266 <https://issues.apache.org/jira/browse/CALCITE-1266> and am working on it.
> 
>> On May 31, 2016, at 1:33 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com>> wrote:
>> 
>> I think CALCITE-1264 could definitely explain the lack of the predicate
>> push down I'm seeing. There's still the issue of the join condition though.
>> I can open a ticket for that.
>> 
>>> The join condition should be =($0, $1).
>> 
>> If that's the case it seems to me that the expectations of join conditions
>> in RelBuilderTest are wrong.
>> 
>> For example:
>> https://github.com/apache/calcite/blob/branch-1.7/core/src/test/java/org/apache/calcite/test/RelBuilderTest.java#L795 <https://github.com/apache/calcite/blob/branch-1.7/core/src/test/java/org/apache/calcite/test/RelBuilderTest.java#L795>
>> 
>> In this case we're joining EMP 8th col (of 8 cols), to DEPT 1st column (of
>> 3 cols). So the condition should be =($7, $8).
>> 
>> On Tue, May 31, 2016 at 6:34 AM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
>> 
>>> PS Jordan logged a JIRA case.
>>> https://issues.apache.org/jira/browse/CALCITE-1262 <https://issues.apache.org/jira/browse/CALCITE-1262> <
>>> https://issues.apache.org/jira/browse/CALCITE-1262 <https://issues.apache.org/jira/browse/CALCITE-1262>>
>>> 
>>>> On May 30, 2016, at 9:27 PM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
>>>> 
>>>> The join condition should be =($0, $1).
>>>> 
>>>> JdbcFilterRule is instantiated when JdbcRules.rules() is called. It is
>>> called in two places: from JdbcConvention.register and from
>>> PlannerTest.MockJdbcTableScan.register.
>>>> 
>>>> Did you see Jacques Nadeau’s recent email? [1] Very likely you are
>>> seeing the same problem.
>>>> 
>>>> Julian
>>>> 
>>>> [1]
>>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU%3DvBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw%40mail.gmail.com%3E <https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU%3DvBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw%40mail.gmail.com%3E>
>>> <
>>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E <https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E>
>>>> 
>>>> 
>>>>> On May 30, 2016, at 2:41 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com> <mailto:
>>> chris@contiamo.com <ma...@contiamo.com>>> wrote:
>>>>> 
>>>>> I'm using a project on both sides before the join, so there is only one
>>>>> column on each side.
>>>>> So in that case should the join condition be ($0, $1)? Or is ($0, $0)
>>>>> correct since it's joining the first left column to the first right
>>> column?
>>>>> 
>>>>> In either case the result set is still not correct, so I'll do some more
>>>>> digging there.
>>>>> 
>>>>> As for the JdbcFilterRule, how is that set? On the BasicDataSource? I
>>>>> couldn't find that being used in a test.
>>>>> 
>>>>> On Sat, May 28, 2016 at 3:00 AM, Julian Hyde <jhyde@apache.org <ma...@apache.org> <mailto:
>>> jhyde@apache.org <ma...@apache.org>>> wrote:
>>>>> 
>>>>>> The plan output has a problem:
>>>>>> 
>>>>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>>>>> 
>>>>>> You are joining column 0 to column 0. You are not combining column 0
>>>>>> from the left side with column 0 from the right side. Column 0 from
>>>>>> the right side would be, say, 5 if the left side has 5 columns.
>>>>>> 
>>>>>> Your RelBuilder code looks correct, in particular the line
>>>>>> 
>>>>>> builder.field(2, 1, "id")
>>>>>> 
>>>>>> ought to reference the 0th column of the right input to the join. I'm
>>>>>> not sure why RelBuilder.join is creating references to the wrong
>>>>>> fields. It might be a bug in RelBuilder.
>>>>>> 
>>>>>> I'd expect it to push the filter down to the JDBC data source: there
>>>>>> would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> 
>>>>>> On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com>
>>> <mailto:chris@contiamo.com <ma...@contiamo.com>>> wrote:
>>>>>>> I'm joining datasets from different sources (using the newly
>>> implemented
>>>>>>> qualified scan), however the following INNER join query returns many
>>> more
>>>>>>> rows than I would expect (it returns all combinations of rows as an
>>> OUTER
>>>>>>> join would):
>>>>>>> 
>>>>>>> builder.scan("source1", "article_facts")
>>>>>>>   .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1,
>>> 0,
>>>>>>> "property_id"), builder.literal(5)))
>>>>>>>   .project(builder.field(1, 0, "article_id"))
>>>>>>> .scan("source2", "articles")
>>>>>>>   .project(builder.field(1, 0, "id"))
>>>>>>> .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
>>>>>>>     builder.field(2, 0, "article_id"),
>>>>>>>     builder.field(2, 1, "id")))
>>>>>>> .build()
>>>>>>> 
>>>>>>> The plan output appears correct:
>>>>>>> 
>>>>>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>>>>>>   LogicalProject(article_id=[$0])
>>>>>>>     LogicalFilter(condition=[=($1, 5)])
>>>>>>>       LogicalTableScan(table=[[source1, article_facts]])
>>>>>>>   LogicalProject(id=[$0])
>>>>>>>     LogicalTableScan(table=[[source2, articles]])
>>>>>>> 
>>>>>>> I have tried reproducing this as a test case in RelBuilderTest, but
>>> if I
>>>>>>> call executeQuery on a statement containing a join I get:
>>>>>>> 
>>>>>>> Internal error: Error while applying rule EnumerableJoinRule, args
>>>>>>> 
>>>>>> 
>>> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
>>>>>>> $0),joinType=inner)]
>>>>>>> 
>>>>>>> I presume this is due to some limitation of the test environment, so
>>>>>> right
>>>>>>> now I'm unsure how to get this to work.
>>>>>>> 
>>>>>>> One more thing I noticed is that the filter predicate (== 5) is not
>>>>>> pushed
>>>>>>> down to the database (Postgres in this case). Instead calcite used
>>>>>> `select
>>>>>>> * from article_facts` and applied the filter afterwards. Is that
>>> expected
>>>>>>> behaviour for the RelBuilder?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> Chris
> 


Re: Using RelBuilder to construct an inner join

Posted by Julian Hyde <jh...@apache.org>.
You’re right. I’ve logged https://issues.apache.org/jira/browse/CALCITE-1266 <https://issues.apache.org/jira/browse/CALCITE-1266> and am working on it.

> On May 31, 2016, at 1:33 AM, Chris Baynes <ch...@contiamo.com> wrote:
> 
> I think CALCITE-1264 could definitely explain the lack of the predicate
> push down I'm seeing. There's still the issue of the join condition though.
> I can open a ticket for that.
> 
>> The join condition should be =($0, $1).
> 
> If that's the case it seems to me that the expectations of join conditions
> in RelBuilderTest are wrong.
> 
> For example:
> https://github.com/apache/calcite/blob/branch-1.7/core/src/test/java/org/apache/calcite/test/RelBuilderTest.java#L795 <https://github.com/apache/calcite/blob/branch-1.7/core/src/test/java/org/apache/calcite/test/RelBuilderTest.java#L795>
> 
> In this case we're joining EMP 8th col (of 8 cols), to DEPT 1st column (of
> 3 cols). So the condition should be =($7, $8).
> 
> On Tue, May 31, 2016 at 6:34 AM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
> 
>> PS Jordan logged a JIRA case.
>> https://issues.apache.org/jira/browse/CALCITE-1262 <https://issues.apache.org/jira/browse/CALCITE-1262> <
>> https://issues.apache.org/jira/browse/CALCITE-1262 <https://issues.apache.org/jira/browse/CALCITE-1262>>
>> 
>>> On May 30, 2016, at 9:27 PM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>> The join condition should be =($0, $1).
>>> 
>>> JdbcFilterRule is instantiated when JdbcRules.rules() is called. It is
>> called in two places: from JdbcConvention.register and from
>> PlannerTest.MockJdbcTableScan.register.
>>> 
>>> Did you see Jacques Nadeau’s recent email? [1] Very likely you are
>> seeing the same problem.
>>> 
>>> Julian
>>> 
>>> [1]
>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU%3DvBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw%40mail.gmail.com%3E
>> <
>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E <https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E>
>>> 
>>> 
>>>> On May 30, 2016, at 2:41 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com> <mailto:
>> chris@contiamo.com <ma...@contiamo.com>>> wrote:
>>>> 
>>>> I'm using a project on both sides before the join, so there is only one
>>>> column on each side.
>>>> So in that case should the join condition be ($0, $1)? Or is ($0, $0)
>>>> correct since it's joining the first left column to the first right
>> column?
>>>> 
>>>> In either case the result set is still not correct, so I'll do some more
>>>> digging there.
>>>> 
>>>> As for the JdbcFilterRule, how is that set? On the BasicDataSource? I
>>>> couldn't find that being used in a test.
>>>> 
>>>> On Sat, May 28, 2016 at 3:00 AM, Julian Hyde <jhyde@apache.org <ma...@apache.org> <mailto:
>> jhyde@apache.org <ma...@apache.org>>> wrote:
>>>> 
>>>>> The plan output has a problem:
>>>>> 
>>>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>>>> 
>>>>> You are joining column 0 to column 0. You are not combining column 0
>>>>> from the left side with column 0 from the right side. Column 0 from
>>>>> the right side would be, say, 5 if the left side has 5 columns.
>>>>> 
>>>>> Your RelBuilder code looks correct, in particular the line
>>>>> 
>>>>> builder.field(2, 1, "id")
>>>>> 
>>>>> ought to reference the 0th column of the right input to the join. I'm
>>>>> not sure why RelBuilder.join is creating references to the wrong
>>>>> fields. It might be a bug in RelBuilder.
>>>>> 
>>>>> I'd expect it to push the filter down to the JDBC data source: there
>>>>> would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?
>>>>> 
>>>>> Julian
>>>>> 
>>>>> 
>>>>> On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com>
>> <mailto:chris@contiamo.com <ma...@contiamo.com>>> wrote:
>>>>>> I'm joining datasets from different sources (using the newly
>> implemented
>>>>>> qualified scan), however the following INNER join query returns many
>> more
>>>>>> rows than I would expect (it returns all combinations of rows as an
>> OUTER
>>>>>> join would):
>>>>>> 
>>>>>> builder.scan("source1", "article_facts")
>>>>>>   .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1,
>> 0,
>>>>>> "property_id"), builder.literal(5)))
>>>>>>   .project(builder.field(1, 0, "article_id"))
>>>>>> .scan("source2", "articles")
>>>>>>   .project(builder.field(1, 0, "id"))
>>>>>> .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
>>>>>>     builder.field(2, 0, "article_id"),
>>>>>>     builder.field(2, 1, "id")))
>>>>>> .build()
>>>>>> 
>>>>>> The plan output appears correct:
>>>>>> 
>>>>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>>>>>   LogicalProject(article_id=[$0])
>>>>>>     LogicalFilter(condition=[=($1, 5)])
>>>>>>       LogicalTableScan(table=[[source1, article_facts]])
>>>>>>   LogicalProject(id=[$0])
>>>>>>     LogicalTableScan(table=[[source2, articles]])
>>>>>> 
>>>>>> I have tried reproducing this as a test case in RelBuilderTest, but
>> if I
>>>>>> call executeQuery on a statement containing a join I get:
>>>>>> 
>>>>>> Internal error: Error while applying rule EnumerableJoinRule, args
>>>>>> 
>>>>> 
>> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
>>>>>> $0),joinType=inner)]
>>>>>> 
>>>>>> I presume this is due to some limitation of the test environment, so
>>>>> right
>>>>>> now I'm unsure how to get this to work.
>>>>>> 
>>>>>> One more thing I noticed is that the filter predicate (== 5) is not
>>>>> pushed
>>>>>> down to the database (Postgres in this case). Instead calcite used
>>>>> `select
>>>>>> * from article_facts` and applied the filter afterwards. Is that
>> expected
>>>>>> behaviour for the RelBuilder?
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Chris


Re: Using RelBuilder to construct an inner join

Posted by Chris Baynes <ch...@contiamo.com>.
I think CALCITE-1264 could definitely explain the lack of the predicate
push down I'm seeing. There's still the issue of the join condition though.
I can open a ticket for that.

> The join condition should be =($0, $1).

If that's the case it seems to me that the expectations of join conditions
in RelBuilderTest are wrong.

For example:
https://github.com/apache/calcite/blob/branch-1.7/core/src/test/java/org/apache/calcite/test/RelBuilderTest.java#L795

In this case we're joining EMP 8th col (of 8 cols), to DEPT 1st column (of
3 cols). So the condition should be =($7, $8).

On Tue, May 31, 2016 at 6:34 AM, Julian Hyde <jh...@apache.org> wrote:

> PS Jordan logged a JIRA case.
> https://issues.apache.org/jira/browse/CALCITE-1262 <
> https://issues.apache.org/jira/browse/CALCITE-1262>
>
> > On May 30, 2016, at 9:27 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> > The join condition should be =($0, $1).
> >
> > JdbcFilterRule is instantiated when JdbcRules.rules() is called. It is
> called in two places: from JdbcConvention.register and from
> PlannerTest.MockJdbcTableScan.register.
> >
> > Did you see Jacques Nadeau’s recent email? [1] Very likely you are
> seeing the same problem.
> >
> > Julian
> >
> > [1]
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU%3DvBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw%40mail.gmail.com%3E
> <
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E
> >
> >
> >> On May 30, 2016, at 2:41 AM, Chris Baynes <chris@contiamo.com <mailto:
> chris@contiamo.com>> wrote:
> >>
> >> I'm using a project on both sides before the join, so there is only one
> >> column on each side.
> >> So in that case should the join condition be ($0, $1)? Or is ($0, $0)
> >> correct since it's joining the first left column to the first right
> column?
> >>
> >> In either case the result set is still not correct, so I'll do some more
> >> digging there.
> >>
> >> As for the JdbcFilterRule, how is that set? On the BasicDataSource? I
> >> couldn't find that being used in a test.
> >>
> >> On Sat, May 28, 2016 at 3:00 AM, Julian Hyde <jhyde@apache.org <mailto:
> jhyde@apache.org>> wrote:
> >>
> >>> The plan output has a problem:
> >>>
> >>>  LogicalJoin(condition=[=($0, $0)], joinType=[inner])
> >>>
> >>> You are joining column 0 to column 0. You are not combining column 0
> >>> from the left side with column 0 from the right side. Column 0 from
> >>> the right side would be, say, 5 if the left side has 5 columns.
> >>>
> >>> Your RelBuilder code looks correct, in particular the line
> >>>
> >>>  builder.field(2, 1, "id")
> >>>
> >>> ought to reference the 0th column of the right input to the join. I'm
> >>> not sure why RelBuilder.join is creating references to the wrong
> >>> fields. It might be a bug in RelBuilder.
> >>>
> >>> I'd expect it to push the filter down to the JDBC data source: there
> >>> would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?
> >>>
> >>> Julian
> >>>
> >>>
> >>> On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <chris@contiamo.com
> <ma...@contiamo.com>> wrote:
> >>>> I'm joining datasets from different sources (using the newly
> implemented
> >>>> qualified scan), however the following INNER join query returns many
> more
> >>>> rows than I would expect (it returns all combinations of rows as an
> OUTER
> >>>> join would):
> >>>>
> >>>> builder.scan("source1", "article_facts")
> >>>>    .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1,
> 0,
> >>>> "property_id"), builder.literal(5)))
> >>>>    .project(builder.field(1, 0, "article_id"))
> >>>>  .scan("source2", "articles")
> >>>>    .project(builder.field(1, 0, "id"))
> >>>>  .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
> >>>>      builder.field(2, 0, "article_id"),
> >>>>      builder.field(2, 1, "id")))
> >>>>  .build()
> >>>>
> >>>> The plan output appears correct:
> >>>>
> >>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
> >>>>    LogicalProject(article_id=[$0])
> >>>>      LogicalFilter(condition=[=($1, 5)])
> >>>>        LogicalTableScan(table=[[source1, article_facts]])
> >>>>    LogicalProject(id=[$0])
> >>>>      LogicalTableScan(table=[[source2, articles]])
> >>>>
> >>>> I have tried reproducing this as a test case in RelBuilderTest, but
> if I
> >>>> call executeQuery on a statement containing a join I get:
> >>>>
> >>>> Internal error: Error while applying rule EnumerableJoinRule, args
> >>>>
> >>>
> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
> >>>> $0),joinType=inner)]
> >>>>
> >>>> I presume this is due to some limitation of the test environment, so
> >>> right
> >>>> now I'm unsure how to get this to work.
> >>>>
> >>>> One more thing I noticed is that the filter predicate (== 5) is not
> >>> pushed
> >>>> down to the database (Postgres in this case). Instead calcite used
> >>> `select
> >>>> * from article_facts` and applied the filter afterwards. Is that
> expected
> >>>> behaviour for the RelBuilder?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Chris
> >>>
> >
>
>

Re: Using RelBuilder to construct an inner join

Posted by Julian Hyde <jh...@apache.org>.
PS Jordan logged a JIRA case. https://issues.apache.org/jira/browse/CALCITE-1262 <https://issues.apache.org/jira/browse/CALCITE-1262> 

> On May 30, 2016, at 9:27 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> The join condition should be =($0, $1).
> 
> JdbcFilterRule is instantiated when JdbcRules.rules() is called. It is called in two places: from JdbcConvention.register and from PlannerTest.MockJdbcTableScan.register.
> 
> Did you see Jacques Nadeau’s recent email? [1] Very likely you are seeing the same problem.
> 
> Julian
> 
> [1] https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU%3DvBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw%40mail.gmail.com%3E <https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E> 
> 
>> On May 30, 2016, at 2:41 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com>> wrote:
>> 
>> I'm using a project on both sides before the join, so there is only one
>> column on each side.
>> So in that case should the join condition be ($0, $1)? Or is ($0, $0)
>> correct since it's joining the first left column to the first right column?
>> 
>> In either case the result set is still not correct, so I'll do some more
>> digging there.
>> 
>> As for the JdbcFilterRule, how is that set? On the BasicDataSource? I
>> couldn't find that being used in a test.
>> 
>> On Sat, May 28, 2016 at 3:00 AM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
>> 
>>> The plan output has a problem:
>>> 
>>>  LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>> 
>>> You are joining column 0 to column 0. You are not combining column 0
>>> from the left side with column 0 from the right side. Column 0 from
>>> the right side would be, say, 5 if the left side has 5 columns.
>>> 
>>> Your RelBuilder code looks correct, in particular the line
>>> 
>>>  builder.field(2, 1, "id")
>>> 
>>> ought to reference the 0th column of the right input to the join. I'm
>>> not sure why RelBuilder.join is creating references to the wrong
>>> fields. It might be a bug in RelBuilder.
>>> 
>>> I'd expect it to push the filter down to the JDBC data source: there
>>> would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?
>>> 
>>> Julian
>>> 
>>> 
>>> On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <chris@contiamo.com <ma...@contiamo.com>> wrote:
>>>> I'm joining datasets from different sources (using the newly implemented
>>>> qualified scan), however the following INNER join query returns many more
>>>> rows than I would expect (it returns all combinations of rows as an OUTER
>>>> join would):
>>>> 
>>>> builder.scan("source1", "article_facts")
>>>>    .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1, 0,
>>>> "property_id"), builder.literal(5)))
>>>>    .project(builder.field(1, 0, "article_id"))
>>>>  .scan("source2", "articles")
>>>>    .project(builder.field(1, 0, "id"))
>>>>  .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
>>>>      builder.field(2, 0, "article_id"),
>>>>      builder.field(2, 1, "id")))
>>>>  .build()
>>>> 
>>>> The plan output appears correct:
>>>> 
>>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>>>    LogicalProject(article_id=[$0])
>>>>      LogicalFilter(condition=[=($1, 5)])
>>>>        LogicalTableScan(table=[[source1, article_facts]])
>>>>    LogicalProject(id=[$0])
>>>>      LogicalTableScan(table=[[source2, articles]])
>>>> 
>>>> I have tried reproducing this as a test case in RelBuilderTest, but if I
>>>> call executeQuery on a statement containing a join I get:
>>>> 
>>>> Internal error: Error while applying rule EnumerableJoinRule, args
>>>> 
>>> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
>>>> $0),joinType=inner)]
>>>> 
>>>> I presume this is due to some limitation of the test environment, so
>>> right
>>>> now I'm unsure how to get this to work.
>>>> 
>>>> One more thing I noticed is that the filter predicate (== 5) is not
>>> pushed
>>>> down to the database (Postgres in this case). Instead calcite used
>>> `select
>>>> * from article_facts` and applied the filter afterwards. Is that expected
>>>> behaviour for the RelBuilder?
>>>> 
>>>> Thanks!
>>>> 
>>>> Chris
>>> 
> 


Re: Using RelBuilder to construct an inner join

Posted by Julian Hyde <jh...@apache.org>.
The join condition should be =($0, $1).

JdbcFilterRule is instantiated when JdbcRules.rules() is called. It is called in two places: from JdbcConvention.register and from PlannerTest.MockJdbcTableScan.register.

Did you see Jacques Nadeau’s recent email? [1] Very likely you are seeing the same problem.

Julian

[1] https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU%3DvBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw%40mail.gmail.com%3E <https://mail-archives.apache.org/mod_mbox/calcite-dev/201605.mbox/%3CCAKa9qDmuNU=vBa1wT51n3WbaPqq9v70WSYuNonQFbDDKGVK5jw@mail.gmail.com%3E> 

> On May 30, 2016, at 2:41 AM, Chris Baynes <ch...@contiamo.com> wrote:
> 
> I'm using a project on both sides before the join, so there is only one
> column on each side.
> So in that case should the join condition be ($0, $1)? Or is ($0, $0)
> correct since it's joining the first left column to the first right column?
> 
> In either case the result set is still not correct, so I'll do some more
> digging there.
> 
> As for the JdbcFilterRule, how is that set? On the BasicDataSource? I
> couldn't find that being used in a test.
> 
> On Sat, May 28, 2016 at 3:00 AM, Julian Hyde <jh...@apache.org> wrote:
> 
>> The plan output has a problem:
>> 
>>  LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>> 
>> You are joining column 0 to column 0. You are not combining column 0
>> from the left side with column 0 from the right side. Column 0 from
>> the right side would be, say, 5 if the left side has 5 columns.
>> 
>> Your RelBuilder code looks correct, in particular the line
>> 
>>  builder.field(2, 1, "id")
>> 
>> ought to reference the 0th column of the right input to the join. I'm
>> not sure why RelBuilder.join is creating references to the wrong
>> fields. It might be a bug in RelBuilder.
>> 
>> I'd expect it to push the filter down to the JDBC data source: there
>> would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?
>> 
>> Julian
>> 
>> 
>> On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <ch...@contiamo.com> wrote:
>>> I'm joining datasets from different sources (using the newly implemented
>>> qualified scan), however the following INNER join query returns many more
>>> rows than I would expect (it returns all combinations of rows as an OUTER
>>> join would):
>>> 
>>> builder.scan("source1", "article_facts")
>>>    .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1, 0,
>>> "property_id"), builder.literal(5)))
>>>    .project(builder.field(1, 0, "article_id"))
>>>  .scan("source2", "articles")
>>>    .project(builder.field(1, 0, "id"))
>>>  .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
>>>      builder.field(2, 0, "article_id"),
>>>      builder.field(2, 1, "id")))
>>>  .build()
>>> 
>>> The plan output appears correct:
>>> 
>>> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>>>    LogicalProject(article_id=[$0])
>>>      LogicalFilter(condition=[=($1, 5)])
>>>        LogicalTableScan(table=[[source1, article_facts]])
>>>    LogicalProject(id=[$0])
>>>      LogicalTableScan(table=[[source2, articles]])
>>> 
>>> I have tried reproducing this as a test case in RelBuilderTest, but if I
>>> call executeQuery on a statement containing a join I get:
>>> 
>>> Internal error: Error while applying rule EnumerableJoinRule, args
>>> 
>> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
>>> $0),joinType=inner)]
>>> 
>>> I presume this is due to some limitation of the test environment, so
>> right
>>> now I'm unsure how to get this to work.
>>> 
>>> One more thing I noticed is that the filter predicate (== 5) is not
>> pushed
>>> down to the database (Postgres in this case). Instead calcite used
>> `select
>>> * from article_facts` and applied the filter afterwards. Is that expected
>>> behaviour for the RelBuilder?
>>> 
>>> Thanks!
>>> 
>>> Chris
>> 


Re: Using RelBuilder to construct an inner join

Posted by Chris Baynes <ch...@contiamo.com>.
I'm using a project on both sides before the join, so there is only one
column on each side.
So in that case should the join condition be ($0, $1)? Or is ($0, $0)
correct since it's joining the first left column to the first right column?

In either case the result set is still not correct, so I'll do some more
digging there.

As for the JdbcFilterRule, how is that set? On the BasicDataSource? I
couldn't find that being used in a test.

On Sat, May 28, 2016 at 3:00 AM, Julian Hyde <jh...@apache.org> wrote:

> The plan output has a problem:
>
>   LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>
> You are joining column 0 to column 0. You are not combining column 0
> from the left side with column 0 from the right side. Column 0 from
> the right side would be, say, 5 if the left side has 5 columns.
>
> Your RelBuilder code looks correct, in particular the line
>
>   builder.field(2, 1, "id")
>
> ought to reference the 0th column of the right input to the join. I'm
> not sure why RelBuilder.join is creating references to the wrong
> fields. It might be a bug in RelBuilder.
>
> I'd expect it to push the filter down to the JDBC data source: there
> would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?
>
> Julian
>
>
> On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <ch...@contiamo.com> wrote:
> > I'm joining datasets from different sources (using the newly implemented
> > qualified scan), however the following INNER join query returns many more
> > rows than I would expect (it returns all combinations of rows as an OUTER
> > join would):
> >
> > builder.scan("source1", "article_facts")
> >     .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1, 0,
> > "property_id"), builder.literal(5)))
> >     .project(builder.field(1, 0, "article_id"))
> >   .scan("source2", "articles")
> >     .project(builder.field(1, 0, "id"))
> >   .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
> >       builder.field(2, 0, "article_id"),
> >       builder.field(2, 1, "id")))
> >   .build()
> >
> > The plan output appears correct:
> >
> > LogicalJoin(condition=[=($0, $0)], joinType=[inner])
> >     LogicalProject(article_id=[$0])
> >       LogicalFilter(condition=[=($1, 5)])
> >         LogicalTableScan(table=[[source1, article_facts]])
> >     LogicalProject(id=[$0])
> >       LogicalTableScan(table=[[source2, articles]])
> >
> > I have tried reproducing this as a test case in RelBuilderTest, but if I
> > call executeQuery on a statement containing a join I get:
> >
> > Internal error: Error while applying rule EnumerableJoinRule, args
> >
> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
> > $0),joinType=inner)]
> >
> > I presume this is due to some limitation of the test environment, so
> right
> > now I'm unsure how to get this to work.
> >
> > One more thing I noticed is that the filter predicate (== 5) is not
> pushed
> > down to the database (Postgres in this case). Instead calcite used
> `select
> > * from article_facts` and applied the filter afterwards. Is that expected
> > behaviour for the RelBuilder?
> >
> > Thanks!
> >
> > Chris
>

Re: Using RelBuilder to construct an inner join

Posted by Julian Hyde <jh...@apache.org>.
The plan output has a problem:

  LogicalJoin(condition=[=($0, $0)], joinType=[inner])

You are joining column 0 to column 0. You are not combining column 0
from the left side with column 0 from the right side. Column 0 from
the right side would be, say, 5 if the left side has 5 columns.

Your RelBuilder code looks correct, in particular the line

  builder.field(2, 1, "id")

ought to reference the 0th column of the right input to the join. I'm
not sure why RelBuilder.join is creating references to the wrong
fields. It might be a bug in RelBuilder.

I'd expect it to push the filter down to the JDBC data source: there
would be a JdbcFilter in the plan. Is JdbcFilterRule enabled?

Julian


On Thu, May 26, 2016 at 9:45 AM, Chris Baynes <ch...@contiamo.com> wrote:
> I'm joining datasets from different sources (using the newly implemented
> qualified scan), however the following INNER join query returns many more
> rows than I would expect (it returns all combinations of rows as an OUTER
> join would):
>
> builder.scan("source1", "article_facts")
>     .filter(builder.call(SqlStdOperatorTable.EQUALS, builder.field(1, 0,
> "property_id"), builder.literal(5)))
>     .project(builder.field(1, 0, "article_id"))
>   .scan("source2", "articles")
>     .project(builder.field(1, 0, "id"))
>   .join(JoinRelType.INNER, builder.call(SqlStdOperatorTable.EQUALS,
>       builder.field(2, 0, "article_id"),
>       builder.field(2, 1, "id")))
>   .build()
>
> The plan output appears correct:
>
> LogicalJoin(condition=[=($0, $0)], joinType=[inner])
>     LogicalProject(article_id=[$0])
>       LogicalFilter(condition=[=($1, 5)])
>         LogicalTableScan(table=[[source1, article_facts]])
>     LogicalProject(id=[$0])
>       LogicalTableScan(table=[[source2, articles]])
>
> I have tried reproducing this as a test case in RelBuilderTest, but if I
> call executeQuery on a statement containing a join I get:
>
> Internal error: Error while applying rule EnumerableJoinRule, args
> [rel#40:LogicalJoin.NONE.[](left=rel#38:Subset#1.NONE.[0],right=rel#39:Subset#2.NONE.[0],condition==($7,
> $0),joinType=inner)]
>
> I presume this is due to some limitation of the test environment, so right
> now I'm unsure how to get this to work.
>
> One more thing I noticed is that the filter predicate (== 5) is not pushed
> down to the database (Postgres in this case). Instead calcite used `select
> * from article_facts` and applied the filter afterwards. Is that expected
> behaviour for the RelBuilder?
>
> Thanks!
>
> Chris