You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/06/23 01:10:00 UTC

[jira] [Commented] (IMPALA-12200) Cap stats NDV from SetOperationStmt.createMetadata

    [ https://issues.apache.org/jira/browse/IMPALA-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736333#comment-17736333 ] 

ASF subversion and git services commented on IMPALA-12200:
----------------------------------------------------------

Commit b1467b1567a85c3f7d1309d6b8718a5d57df2e3a in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1467b156 ]

IMPALA-12200: Cap stats NDV from SetOperationStmt.createMetadata

Union operator will create merged ColumnStats at
SetOperationStmt.createMetadata where it adds all ColumnStats from its
input children. One of the stats being accumulated is NDV (num distinct
value). There is an opportunity to lower the resulting NDV if all source
expression is referring to the same column. This lower NDV can benefit
Aggregation node on top of the Union node because it can lower
cardinality and memory estimate of the Aggregation node.

Testing:
- Pass core tests.

Change-Id: Ic0bb2eff5005fdfb11adf31499214c63dd552c05
Reviewed-on: http://gerrit.cloudera.org:8080/20040
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Cap stats NDV from SetOperationStmt.createMetadata
> --------------------------------------------------
>
>                 Key: IMPALA-12200
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12200
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 4.3.0
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 4.3.0
>
>
> Union operator will create merged ColumnStats at SetOperationStmt.createMetadata where it adds all ColumnStats from its input children. One of the stats being accumulated is NDV (num distinct value). There is an opportunity to lower the resulting NDV if all source expression is referring to the same column. This lower NDV can benefit Aggregation node on top of the Union node because it can lower cardinality and memory estimate of the Aggregation node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org