You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Stamatis Zampetakis (JIRA)" <ji...@apache.org> on 2019/05/16 08:04:00 UTC

[jira] [Updated] (CALCITE-3070) Inner join between MySQL tables produce wrong results if join order changed

     [ https://issues.apache.org/jira/browse/CALCITE-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stamatis Zampetakis updated CALCITE-3070:
-----------------------------------------
    Description: 
Summary of issue:
 * We are trying to inner join 2 MySQL tables.  
 ** persons table: a very small table 
 ** foodmart table: a large table
 * The query produce different results if we change the join order in the "from" clause: "from A join B" vs "from B join A". One query produces correct results while the other query produces empty results. 
 * The physical plan generated for the produce-empty query was not optimal. it tried to scan be big table. 

 

Working query and physical plan:



{code:sql}
select "t2"."product_name", "t2"."customer_id" from 
"foodmart-mysql"."foodmart" as "t2" join "persons"."persons" as "t1" 
  on"t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'
{code}

{noformat}
EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = {1287.971895621705 rows, 3719.5 cpu, 0.0 io}, id = 671
  EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = {912.971895621705 rows, 1094.5 cpu, 0.0 io}, id = 667
    JdbcToEnumerableConverter: rowcount = 25.0, cumulative cost = {147.5 rows, 283.5 cpu, 0.0 io}, id = 660
      JdbcProject(product_name=[$5], customer_id=[$22], $f85=[>=($78, 1997-02-12 00:00:00)], $f86=[<($78, 1997-02-12 00:01:00)]): rowcount = 25.0, cumulative cost = {145.0 rows, 281.0 cpu, 0.0 io}, id = 658
        JdbcFilter(condition=[AND(>=($78, 1997-02-12 00:00:00), <($78, 1997-02-12 00:01:00))]): rowcount = 25.0, cumulative cost = {125.0 rows, 201.0 cpu, 0.0 io}, id = 656
          JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 340
    EnumerableCalc(expr#0..5=[\{inputs}], person_id_int=[$t5]): rowcount = 100.0, cumulative cost = {210.0 rows, 811.0 cpu, 0.0 io}, id = 673
      JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 663
        JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 341
{noformat} 

Produce-empty query and physical plan:


{code:sql}
select "t2"."product_name", "t2"."customer_id" from 
"persons"."persons" as "t1" join "foodmart-mysql"."foodmart" as "t2" 
  on "t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'
{code}

{noformat}
EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = {1255.471895621705 rows, 12427.0 cpu, 0.0 io}, id = 334
  EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = {880.471895621705 rows, 9802.0 cpu, 0.0 io}, id = 328
    EnumerableCalc(expr#0..84=[\{inputs}], expr#85=[1997-02-12 00:00:00], expr#86=[>=($t78, $t85)], expr#87=[1997-02-12 00:01:00], expr#88=[<($t78, $t87)], expr#89=[AND($t86, $t88)], product_name=[$t5], customer_id=[$t22], $f85=[$t86], $f86=[$t88], $condition=[$t89]): rowcount = 25.0, cumulative cost = {135.0 rows, 9611.0 cpu, 0.0 io}, id = 338
      JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 317
        JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1
    JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {190.0 rows, 191.0 cpu, 0.0 io}, id = 326
      JdbcProject(person_id_int=[$5]): rowcount = 100.0, cumulative cost = {180.0 rows, 181.0 cpu, 0.0 io}, id = 324
        JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0
{noformat}


 

  was:
Summary of issue:
 * We are trying to inner join 2 MySQL tables.  
 ** persons table: a very small table 
 ** foodmart table: a large table
 * The query produce different results if we change the join order in the "from" clause: "from A join B" vs "from B join A". One query produces correct results while the other query produces empty results. 
 * The physical plan generated for the produce-empty query was not optimal. it tried to scan be big table. 

 

Working query and physical plan:

select "t2"."product_name", "t2"."customer_id" from 
"foodmart-mysql"."foodmart" as "t2" join "persons"."persons" as "t1" on"t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'

EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = \{1287.971895621705 rows, 3719.5 cpu, 0.0 io}, id = 671
 EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = \{912.971895621705 rows, 1094.5 cpu, 0.0 io}, id = 667
 JdbcToEnumerableConverter: rowcount = 25.0, cumulative cost = \{147.5 rows, 283.5 cpu, 0.0 io}, id = 660
 JdbcProject(product_name=[$5], customer_id=[$22], $f85=[>=($78, 1997-02-12 00:00:00)], $f86=[<($78, 1997-02-12 00:01:00)]): rowcount = 25.0, cumulative cost = \{145.0 rows, 281.0 cpu, 0.0 io}, id = 658
 JdbcFilter(condition=[AND(>=($78, 1997-02-12 00:00:00), <($78, 1997-02-12 00:01:00))]): rowcount = 25.0, cumulative cost = \{125.0 rows, 201.0 cpu, 0.0 io}, id = 656
 JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 340
 EnumerableCalc(expr#0..5=[\{inputs}], person_id_int=[$t5]): rowcount = 100.0, cumulative cost = \{210.0 rows, 811.0 cpu, 0.0 io}, id = 673
 JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = \{110.0 rows, 111.0 cpu, 0.0 io}, id = 663
 JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 341

 

Produce-empty query and physical plan:

select "t2"."product_name", "t2"."customer_id" from 
"persons"."persons" as "t1" join "foodmart-mysql"."foodmart" as "t2" on "t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'

EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = \{1255.471895621705 rows, 12427.0 cpu, 0.0 io}, id = 334
 EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = \{880.471895621705 rows, 9802.0 cpu, 0.0 io}, id = 328
 EnumerableCalc(expr#0..84=[\{inputs}], expr#85=[1997-02-12 00:00:00], expr#86=[>=($t78, $t85)], expr#87=[1997-02-12 00:01:00], expr#88=[<($t78, $t87)], expr#89=[AND($t86, $t88)], product_name=[$t5], customer_id=[$t22], $f85=[$t86], $f86=[$t88], $condition=[$t89]): rowcount = 25.0, cumulative cost = \{135.0 rows, 9611.0 cpu, 0.0 io}, id = 338
 JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = \{110.0 rows, 111.0 cpu, 0.0 io}, id = 317
 JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 1
 JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = \{190.0 rows, 191.0 cpu, 0.0 io}, id = 326
 JdbcProject(person_id_int=[$5]): rowcount = 100.0, cumulative cost = \{180.0 rows, 181.0 cpu, 0.0 io}, id = 324
 JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = \{100.0 rows, 101.0 cpu, 0.0 io}, id = 0

 


> Inner join between MySQL tables produce wrong results if join order changed
> ---------------------------------------------------------------------------
>
>                 Key: CALCITE-3070
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3070
>             Project: Calcite
>          Issue Type: Bug
>          Components: core, jdbc-adapter
>    Affects Versions: 1.19.0
>            Reporter: Zhibin Zhou
>            Priority: Critical
>         Attachments: log.txt
>
>
> Summary of issue:
>  * We are trying to inner join 2 MySQL tables.  
>  ** persons table: a very small table 
>  ** foodmart table: a large table
>  * The query produce different results if we change the join order in the "from" clause: "from A join B" vs "from B join A". One query produces correct results while the other query produces empty results. 
>  * The physical plan generated for the produce-empty query was not optimal. it tried to scan be big table. 
>  
> Working query and physical plan:
> {code:sql}
> select "t2"."product_name", "t2"."customer_id" from 
> "foodmart-mysql"."foodmart" as "t2" join "persons"."persons" as "t1" 
>   on"t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'
> {code}
> {noformat}
> EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = {1287.971895621705 rows, 3719.5 cpu, 0.0 io}, id = 671
>   EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = {912.971895621705 rows, 1094.5 cpu, 0.0 io}, id = 667
>     JdbcToEnumerableConverter: rowcount = 25.0, cumulative cost = {147.5 rows, 283.5 cpu, 0.0 io}, id = 660
>       JdbcProject(product_name=[$5], customer_id=[$22], $f85=[>=($78, 1997-02-12 00:00:00)], $f86=[<($78, 1997-02-12 00:01:00)]): rowcount = 25.0, cumulative cost = {145.0 rows, 281.0 cpu, 0.0 io}, id = 658
>         JdbcFilter(condition=[AND(>=($78, 1997-02-12 00:00:00), <($78, 1997-02-12 00:01:00))]): rowcount = 25.0, cumulative cost = {125.0 rows, 201.0 cpu, 0.0 io}, id = 656
>           JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 340
>     EnumerableCalc(expr#0..5=[\{inputs}], person_id_int=[$t5]): rowcount = 100.0, cumulative cost = {210.0 rows, 811.0 cpu, 0.0 io}, id = 673
>       JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 663
>         JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 341
> {noformat} 
> Produce-empty query and physical plan:
> {code:sql}
> select "t2"."product_name", "t2"."customer_id" from 
> "persons"."persons" as "t1" join "foodmart-mysql"."foodmart" as "t2" 
>   on "t1"."person_id_int"="t2"."customer_id" and"t2"."timestamp" >= '1997-02-12 00:00:00' and "t2"."timestamp" < '1997-02-12 00:01:00'
> {code}
> {noformat}
> EnumerableCalc(expr#0..4=[\{inputs}], proj#0..1=[\{exprs}]): rowcount = 375.0, cumulative cost = {1255.471895621705 rows, 12427.0 cpu, 0.0 io}, id = 334
>   EnumerableJoin(condition=[=($1, $4)], joinType=[inner]): rowcount = 375.0, cumulative cost = {880.471895621705 rows, 9802.0 cpu, 0.0 io}, id = 328
>     EnumerableCalc(expr#0..84=[\{inputs}], expr#85=[1997-02-12 00:00:00], expr#86=[>=($t78, $t85)], expr#87=[1997-02-12 00:01:00], expr#88=[<($t78, $t87)], expr#89=[AND($t86, $t88)], product_name=[$t5], customer_id=[$t22], $f85=[$t86], $f86=[$t88], $condition=[$t89]): rowcount = 25.0, cumulative cost = {135.0 rows, 9611.0 cpu, 0.0 io}, id = 338
>       JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {110.0 rows, 111.0 cpu, 0.0 io}, id = 317
>         JdbcTableScan(table=[[foodmart-mysql, foodmart]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 1
>     JdbcToEnumerableConverter: rowcount = 100.0, cumulative cost = {190.0 rows, 191.0 cpu, 0.0 io}, id = 326
>       JdbcProject(person_id_int=[$5]): rowcount = 100.0, cumulative cost = {180.0 rows, 181.0 cpu, 0.0 io}, id = 324
>         JdbcTableScan(table=[[persons, persons]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 0
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)