You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2018/09/20 21:07:00 UTC
[jira] [Created] (IMPALA-7602) Definition of NDV differs between
planner and stats mechanism
Paul Rogers created IMPALA-7602:
-----------------------------------
Summary: Definition of NDV differs between planner and stats mechanism
Key: IMPALA-7602
URL: https://issues.apache.org/jira/browse/IMPALA-7602
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Paul Rogers
See IMPALA-7310 which says that the Impala NDV function is implemented as "number of non-null distinct values." IMPALA-7310 also says that the stats gathering mechanism uses the same definition.
Down in the comments, we point to [{{ExprNdvTest}}|https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java] which shows that, in the planner itself, when working with constant expressions, NULL is considered a distinct value.
In the case described in IMPALA-7310, this means that a column of only nulls has an NDV=0 if stats are used, NDV=1 if constants are used.
This is a minor point, but would be good to use a single definition everywhere. That way, if we use the "number of non-null distinct values" rule, the "adjusted NDV" is always one more than the "raw" NDV. As it is now, we can't be sure when to add the null adjustment because we don't know if it is already included.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org