You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Rui Wang (Jira)" <ji...@apache.org> on 2020/01/03 00:57:00 UTC

[jira] [Commented] (CALCITE-3665) Better estimate join row count when one of the sides is known to be unique

    [ https://issues.apache.org/jira/browse/CALCITE-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007151#comment-17007151 ] 

Rui Wang commented on CALCITE-3665:
-----------------------------------

Hi [~vladimirsitnikov], could you elaborate a bit on the uniqueness of join sides? 

Are you saying, for example, in "select from emp e left join dept d on (e.X = d.id)", both e.X and d.id are unique, which is also known before query planning?

> Better estimate join row count when one of the sides is known to be unique
> --------------------------------------------------------------------------
>
>                 Key: CALCITE-3665
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3665
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Vladimir Sitnikov
>            Priority: Major
>
> For instance:
> 1) select from emp e left join  dept d on (e.X = d.id)
> This query can't multiply rows, so its row count estimation is rowCount(join.left)
> 2) select from dept d right join emp e on (d.id = e.X)
> This query can't multiply rows, so its row count estimation is rowCount(join.right)
> 3) select from emp e join  dept d on (e.id = d.id)
> The rows can't be multiplied here as well
> Currently, Calcite estimates the number of rows as left*right which is an overestimation in many cases :(



--
This message was sent by Atlassian Jira
(v8.3.4#803005)