You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "yiming.xu (JIRA)" <ji...@apache.org> on 2018/03/15 07:47:00 UTC

[jira] [Commented] (KYLIN-3149) Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected

    [ https://issues.apache.org/jira/browse/KYLIN-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400045#comment-16400045 ] 

yiming.xu commented on KYLIN-3149:
----------------------------------

The underlying reason for this problem is that case when does not have any impact on the cost of ProjectRel, and now adds the count of the case, as an additional overhead when calculating the cost.

> Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected
> ------------------------------------------------------------------------
>
>                 Key: KYLIN-3149
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3149
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v2.2.0
>            Reporter: hongbin ma
>            Assignee: yiming.xu
>            Priority: Major
>         Attachments: dump.txt
>
>
> for queries like:
> {code:sql}
> select TRANS_ID from kylin_sales group by cast (case 
> WHEN  '1030101' = '1030101' then substring(COALESCE(OPS_USER_ID, '999999999999'), 1, 1)
> when  '1030101' = '1030102' then substring(COALESCE(OPS_REGION, '999999999999'), 1, 1)  
> when  '1030101' = '1030103' then substring(COALESCE(LSTG_FORMAT_NAME, '999999999999'), 1, 1)
> when  '1030101' = '1030104' then substring(COALESCE(LSTG_FORMAT_NAME, '999999999999'), 1, 1)
> end as varchar(256)), TRANS_ID;
> {code}
> the expected logical plan after volcano is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>     OLAPLimitRel(ctx=[], fetch=[50000])
>       OLAPAggregateRel(group=[{0, 1}], ctx=[])
>         OLAPProjectRel($f0=[SUBSTRING(CASE(IS NOT NULL($9), $9, '999999999999'), 1, 1)], TRANS_ID=[$0], ctx=[])
>           OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> however the actual is:
> {code}
> EXECUTION PLAN BEFORE REWRITE
> OLAPToEnumerableConverter
>   OLAPLimitRel(ctx=[], fetch=[50000])
>     OLAPProjectRel(TRANS_ID=[$1], ctx=[])
>       OLAPAggregateRel(group=[{0, 1}], ctx=[])
>         OLAPProjectRel($f0=[CAST(CASE(=('1030101', '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, '999999999999'), 1, 1), =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, '999999999999'), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, '999999999999'), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, '999999999999'), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary"], TRANS_ID=[$0], ctx=[])
>           OLAPTableScan(table=[[DEFAULT, KYLIN_SALES]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]])
> {code}
> looks like Calcite's ReduceExpressionsRule.PROJECT_INSTANCE not working as expected. If we dump the internal state of this VolcanoPlanner (org.apache.calcite.plan.volcano.VolcanoPlanner#dump), line 19-21 from the complete dump is attached:
> {code}
> 	rel#337:Subset#1.OLAP.[], best=rel#339, importance=0.6561
> 		rel#339:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=CAST(CASE(=('1030101', '1030101'), SUBSTRING(CASE(IS NOT NULL($9), $9, '999999999999'), 1, 1), =('1030101', '1030102'), SUBSTRING(CASE(IS NOT NULL($10), $10, '999999999999'), 1, 1), =('1030101', '1030103'), SUBSTRING(CASE(IS NOT NULL($2), $2, '999999999999'), 1, 1), =('1030101', '1030104'), SUBSTRING(CASE(IS NOT NULL($2), $2, '999999999999'), 1, 1), null)):VARCHAR(256) CHARACTER SET "UTF-16LE" COLLATE "UTF-16LE$en_US$primary",TRANS_ID=$0,ctx=), rowcount=100.0, cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
> 		rel#348:OLAPProjectRel.OLAP.[](input=rel#303:Subset#0.OLAP.[],$f0=SUBSTRING(CASE(IS NOT NULL($9), $9, '999999999999'), 1, 1),TRANS_ID=$0,ctx=), rowcount=100.0, cumulative cost={15.0 rows, 25.05 cpu, 0.0 io}
> {code}
> we see two rels with same cost:  #339 and #348, where #339 is created from LogicalProject = (OLAPProjectRule)=> OLAPProject, and #348 is created from LogicalProject =( ReduceExpressionsRule) => Reduced LogicalProject =(OLAPProjectRule)=> Reduced OLAPProject . Since ReduceExpressionsRule require Logical Project rather than OLAP Project, #339 is never reduced.
> The worse thing is that cost of #339 and #348 are same. By current volcano planner algorithm  the first met rel will be chosen, so unexpected rel is chosen
> A simple approach to fix this is to refine the rel choosing algorithm: when two rels are equal in cost, choose a "simpler" one. Since we don't have a perfect measurement of "simple", we simply choose the rel with smaller toString() length



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)