You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Paul Rogers (Code Review)" <ge...@cloudera.org> on 2018/09/27 18:11:29 UTC

[Impala-ASF-CR] IMPALA-3710: All-null columns give wrong estimates in planner Modified the planner to handle low-value NDVs by adjusting them upward by one to account for null values. Thus, even an all-null column, which has an NDV of 0 in stats, will have an NDV of 1 in

Paul Rogers has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/11528 )

Change subject: IMPALA-3710: All-null columns give wrong estimates in planner Modified the planner to handle low-value NDVs by adjusting them upward by one to account for null values. Thus, even an all-null column, which has an NDV of 0 in stats, will have an NDV of 1 in
......................................................................

IMPALA-3710: All-null columns give wrong estimates in planner
Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
12 files changed, 448 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers <pa...@yahoo.com>