You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2015/10/26 20:56:27 UTC

[jira] [Updated] (CALCITE-938) More accurate rowCount for Aggregate applied to already unique keys

     [ https://issues.apache.org/jira/browse/CALCITE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Hyde updated CALCITE-938:
--------------------------------
    Summary: More accurate rowCount for Aggregate applied to already unique keys  (was: Make Aggregate return more accurate rowCount if groupSet is unique keys.)

> More accurate rowCount for Aggregate applied to already unique keys
> -------------------------------------------------------------------
>
>                 Key: CALCITE-938
>                 URL: https://issues.apache.org/jira/browse/CALCITE-938
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Maryann Xue
>            Assignee: Maryann Xue
>            Priority: Minor
>             Fix For: 1.5.0
>
>         Attachments: CALCITE-938.patch
>
>
> If columns in "select distinct" are already distinct, there can be two sets of equivalent rel before and after AggregateRemoveRule.
> {code}
> agg
>  |                  input
> input
> 10.0                100.0
> {code}
> Based on the default implementation of rel metadata, the rowCount of the "before" rel is only 1/10 of that of the "after" rel, but meanwhile the "after" rel is definitely cheaper. So the Volcano planner would most likely either fail to pick the cheapest one or have an inconsistent state due to CALCITE-830.
> An example (based EnumerableRel cost model):
> The plan for
> {code}
> select empno, d.deptno
> from "scott".emp
> join (select distinct deptno from "scott".dept) d
> using (deptno);
> {code}
> would be
> {code}
> EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
>     EnumerableAggregate(group=[$0])
>       EnumerableTableScan(table=[[scott, DEPT]])
>     EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
>       EnumerableTableScan(table=[[scott, EMP]])
> {code}
> , while it should be
> {code}
> EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
>   EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
>     EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
>       EnumerableTableScan(table=[[scott, DEPT]])
>     EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
>       EnumerableTableScan(table=[[scott, EMP]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)