You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/08/17 14:29:00 UTC

[jira] [Updated] (IMPALA-7604) In AggregationNode.computeStats, handle cardinality overflow better

     [ https://issues.apache.org/jira/browse/IMPALA-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-7604:
-----------------------------------
    Target Version: Impala 3.4.0  (was: Impala 3.3.0)

> In AggregationNode.computeStats, handle cardinality overflow better
> -------------------------------------------------------------------
>
>                 Key: IMPALA-7604
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7604
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Consider the cardinality overflow logic inĀ [{{AggregationNode.computeStats()}}|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/AggregationNode.java]. Current code:
> {noformat}
>     // if we ended up with an overflow, the estimate is certain to be wrong
>     if (cardinality_ < 0) cardinality_ = -1;
> {noformat}
> This code has a number of issues.
> * The check is done after looping over all conjuncts. It could be that, as a result, the number overflowed twice. The check should be done after each multiplication.
> * Since we know that the number overflowed, a better estimate of the total count is {{Long.MAX_VALUE}}.
> * The code later checks for the -1 value and, if found, uses the cardinality of the first child. This is a worse estimate than using the max value, since the first child might have a low cardinality (it could be the later children that caused the overflow.)
> * If we really do expect overflow, then we are dealing with very large numbers. Being accurate to the row is not needed. Better to use a {{double}} which can handle the large values.
> Since overflow probably seldom occurs, this is not an urgent issue. Though, if overflow does occur, the query is huge, and having at least some estimate of the hugeness is better than none. Also, seems that this code probably evolved; this newbie is looking at it fresh and seeing that the accumulated fixes could be tidied up.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org