You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Zhong Yanghong (JIRA)" <ji...@apache.org> on 2017/08/02 09:57:00 UTC

[jira] [Commented] (KYLIN-2773) Should not push down join condition related columns are compatible while not consistent

    [ https://issues.apache.org/jira/browse/KYLIN-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110647#comment-16110647 ] 

Zhong Yanghong commented on KYLIN-2773:
---------------------------------------

Hi [~liyang.gmt8@gmail.com] and [~mahongbin], could you help have a check? Thanks very much.

> Should not push down join condition related columns are compatible while not consistent
> ---------------------------------------------------------------------------------------
>
>                 Key: KYLIN-2773
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2773
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>             Fix For: v2.1.0
>
>         Attachments: APACHE-KYLIN-2773.patch
>
>
> For sql,
> {code}
> select PART_DT, META_CATEG_NAME, sum(price) 
> from KYLIN_SALES 
> INNER JOIN KYLIN_CATEGORY_GROUPINGS ON KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID
> AND KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID
> INNER JOIN KYLIN_CAL_DT ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT
> group by PART_DT, META_CATEG_NAME
> order by PART_DT, META_CATEG_NAME
> {code}
> the datatype of KYLIN_SALES.LEAF_CATEG_ID is bigint, while the one of KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID is integer.
> Then the plan transformed by kylin is as follows:
> {code}
> OLAPToEnumerableConverter
>   OLAPLimitRel(fetch=[50000])
>     OLAPSortRel(sort0=[$0], dir0=[ASC])
>       OLAPAggregateRel(group=[{0}], EXPR$1=[SUM($1)])
>         OLAPProjectRel(expr#0..22=[{inputs}], META_CATEG_NAME=[$t16], PRICE=[$t5])
>           OLAPJoinRel(condition=[=($0, $22)], joinType=[inner])
> {code}
> {color:#f79232}
> {code}
>             OLAPProjectRel(expr#0..19=[{inputs}], PART_DT=[$t0], LSTG_FORMAT_NAME=[$t1], SLR_SEGMENT_CD=[$t2], LEAF_CATEG_ID=[$t3], LSTG_SITE_ID=[$t4], PRICE=[$t5], SELLER_ID=[$t6], COUNT__=[$t7], MIN_PRICE_=[$t8], COUNT_DISTINCT_SELLER_ID_=[$t9], USER_DEFINED_FIELD1=[$t10], USER_DEFINED_FIELD3=[$t11], UPD_DATE=[$t12], UPD_USER=[$t13], LEAF_CATEG_ID0=[$t14], SITE_ID=[$t15], META_CATEG_NAME=[$t16], CATEG_LVL2_NAME=[$t17], CATEG_LVL3_NAME=[$t18])
>               OLAPJoinRel(condition=[AND(=($3, $19), =($4, $15))], joinType=[inner])
>                 OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>                 OLAPProjectRel(expr#0..8=[{inputs}], expr#9=[CAST($t4):BIGINT], USER_DEFINED_FIELD1=[$t0], USER_DEFINED_FIELD3=[$t1], UPD_DATE=[$t2], UPD_USER=[$t3], LEAF_CATEG_ID=[$t4], SITE_ID=[$t5], META_CATEG_NAME=[$t6], CATEG_LVL2_NAME=[$t7], CATEG_LVL3_NAME=[$t8], LEAF_CATEG_ID9=[$t9])
> {code}
> {color}
> {code}
>                   OLAPTableScan(table=[[DEFAULT, KYLIN_CATEGORY_GROUPINGS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
>             OLAPTableScan(table=[[DEFAULT, KYLIN_CAL_DT]], fields=[[0, 1, 2, 3]])
> {code} 
> However, what we expect is as follows:
> {code}
> OLAPToEnumerableConverter
>   OLAPLimitRel(fetch=[50000])
>     OLAPSortRel(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
>       OLAPAggregateRel(group=[{0, 1}], EXPR$2=[SUM($2)])
>         OLAPProjectRel(expr#0..21=[{inputs}], PART_DT=[$t0], META_CATEG_NAME=[$t17], PRICE=[$t4])
> {code}
> {color:#f79232}
> {code}
>           OLAPJoinRel(condition=[=($0, $21)], joinType=[inner])
>             OLAPJoinRel(condition=[AND(=($1, $15), =($2, $16))], joinType=[inner])
>               OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
> {code}
> {color}
> {code}
>               OLAPTableScan(table=[[DEFAULT, KYLIN_CATEGORY_GROUPINGS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
>             OLAPTableScan(table=[[DEFAULT, KYLIN_CAL_DT]], fields=[[0, 1]])
> {code}
> The reason for this difference is as follows:
> * Although we remove the {{JoinPushExpressionsRule}} in {{OLAPTableScan}}, the method {{RelOptUtil.pushDownJoinConditions()}} is still invoked when creating a join in {{SqlToRelConverter}}.
> * In the method of {{RelOptUtil.pushDownJoinConditions()}}, since the datatypes of the join related columns are not the same, *cast* function will be automatically assigned to KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID. Then a {{OLAPProjectRel}} will be introduced.
> In kylin, we don't need {{RelOptUtil.pushDownJoinConditions()}}. Therefore, the solution for this is just remove that logic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)