You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org> on 2019/06/03 00:10:33 UTC

[Impala-ASF-CR] IMPALA-7608: Estimate row count from file size when no stats available

Fang-Yu Rao has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/12974 )

Change subject: IMPALA-7608: Estimate row count from file size when no stats available
......................................................................

IMPALA-7608: Estimate row count from file size when no stats available

Added the feature that computes an estimated number of rows in the current
hdfs table if the statistics for the cardinality of the current hdfs table is not
available.

Also added an additional query option to revert the change in case of regression.

Testing:
(1) In CardinalityTest.java, replaced the original statement
"verifyCardinality("SELECT a FROM functional.tinytable", -1);"
with "verifyCardinality("SELECT a FROM functional.tinytable", 10);".
(2) In CardinalityTest.java, added a new test to ensure that
the returned cardinality is still -1 when this feature is disabled.
(3) In CarginalityTest.java, added more tests to check the cardinality
of most PlanNode implementations. For each tested PlanNode, the behaviors
before and after we disable the feature are both tested.
(4) In set.test, modified three related test cases to make sure that
the added query option is included after executing "set all" in various
scenarios.
(5) There are 8 JUnit tests in PlannerTest.java that would produce different
distributed query plans when this feature is enabled. Added an additional
JUnit test for each of those 8 affected JUnit tests when this feature is
enabled. Specifically, each tested query in a newly added test files involves
at least one hdfs table without available statistics.

Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
R testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/joins-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing-hdfs-num-rows-est-enabled.test
R testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite-hdfs-num-rows-est-disabled.test
A testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
27 files changed, 2,877 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/12974/9
-- 
To view, visit http://gerrit.cloudera.org:8080/12974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic414121c8df0d5222e4aeea096b5365beb04568a
Gerrit-Change-Number: 12974
Gerrit-PatchSet: 9
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>