You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Paul Rogers (Code Review)" <ge...@cloudera.org> on 2019/01/22 06:58:01 UTC

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Paul Rogers has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12248


Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................

IMPALA-8095: Detailed expression cardinality tests

Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.

Until this patch, no detailed unit tests exist for this topic. Tests
here include:

* Table cardinality, NDV values and null count in metadata retrieved from
  HMS.
* Table cardinality, NDV values and null counts in metadata presented to
  the query.
* Expression NDV and selectivity values (which derive from table
  cardinality and column NDV.)

The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.

Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.

This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previoius SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.

As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.

Testing: This patch includes only tests: no "production" code was
changed.

Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
D fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/common/AbstractFrontendTest.java
A fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
A fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
18 files changed, 1,679 insertions(+), 534 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/12248/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 8
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Feb 2019 22:57:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 7:

(5 comments)

Thanks for the review Bharath. Addressed your comments. Also rebased on to latest master. Please take another look at your convenience.

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@41
PS5, Line 41: 
            :  *
> There seems to be overlap in all of these. Should we consolidate them? Othe
Added comments to explain how the test classes relate to one another.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@110
PS5, Line 110: <Ta
> Does it make sense to tag the jiras for all the bugs here? I know you raise
Done


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@111
PS5, Line 111:     new TableName("functional", "alltypes"),
             :         new TableName("functional", "nullrows"),
             :         new TableName("functional", "manynulls"));
             :     mdLoader.loadTables(tables);
             : 
> Not super clear what is happening here, clarify may be?
Done


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@164
PS5, Line 164: After IMPALA-7659, Impala computes a null count,
             :    * when gathering stats, but the NDV does not include nulls (except for Boolean
             :    * columns) if stats are computed by Impala, but does include nulls if stats are
             :    * computed by Hive.
> Haha, this is indeed bizarre.
Added bug number and fixed typos.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
File fe/src/test/java/org/apache/impala/common/FrontendFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java@164
PS5, Line 164:  protected void clearTestDbs() {
             :     for (Db testDb: testDbs_) {
             :       catalog_.removeDb(testDb.getName());
             :     }
             :   }
> I did something similar in the testcase patch, except that this is backed b
Great. Once both patches are in, we can come back and unify the approach. Will be very helpful to be able to easily create test tables for planner tests without having to create real tables in the functional DB.



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Feb 2019 01:27:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1864/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 21:21:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 7:

(1 comment)

Lgtm, just one pending clarification.

http://gerrit.cloudera.org:8080/#/c/12248/7/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/7/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@122
PS7, Line 122:    // Bug: NDV of partition columns is -1 though it is listed as
             :     // 2 in the shell with: SHOW COLUMN STATS alltypes
             :     //verifyTableCol(allTypes, "year", 2, 0);
             :     // Bug: When tests are run in Eclipse we get the result above.
             :     // But, when the same test is run using maven from the command line,
             :     // we get the result shown below.
             :     // Unit test in Eclipse see the above, unit tests run from the
             :     // Disabling both to avoid a flaky test,
             :     // Same issue for the next three tests.
             :     //verifyTableCol(allTypes, "year", -1, -1);
             :     //verifyTableCol(allTypes, "month", 12, 0);
             :     //verifyTableCol(allTypes, "month", -1, -1);
(This and multiple places below)

Should we dig into this further? Wondering if there is any value in keeping "eclipse" specific references. Maybe just include mvn results since that is what is verified in the unittests? Also since that is consistently happening only with partition columns, I'm curious if there is any hidden bug somewhere.



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Feb 2019 06:29:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1867/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 23:06:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12248

to look at the new patch set (#3).

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................

IMPALA-8095: Detailed expression cardinality tests

Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.

Until this patch, no detailed unit tests exist for this topic. Tests
here include:

* Table cardinality, NDV values and null count in metadata retrieved from
  HMS.
* Table cardinality, NDV values and null counts in metadata presented to
  the query.
* Expression NDV and selectivity values (which derive from table
  cardinality and column NDV.)

The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.

Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.

This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previoius SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.

As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.

Testing: This patch includes only tests: no "production" code was
changed.

Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
D fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/common/AbstractFrontendTest.java
A fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
A fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
18 files changed, 1,680 insertions(+), 535 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/12248/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 8
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 09 Feb 2019 02:56:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Feb 2019 22:57:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 4: Code-Review+1

(4 comments)

It's hard to argue with adding more tests for the current behaviour.

Had a few minor comments. Thought about +2ing but would be good if someone who knows the frontend better looked at the fixtures to make sure they make sense (they do to me).

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@168
PS4, Line 168: 
nit: blank line at function start doesn't help readability


http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@162
PS4, Line 162:    * Test the NDV-with-nulls adjustment. At present, cannot find
Doesn't Impala set it with compute stats after IMPALA-7659 was merged?


http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
File fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java@45
PS4, Line 45:   final FrontendFixture feFixture_ = FrontendFixture.instance();
private? I didn't see a case where this was accessed from a different class.


http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@62
PS4, Line 62: //  protected static ImpaladTestCatalog catalog_ = new ImpaladTestCatalog();
Delete commented-out code?



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jan 2019 00:16:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2013/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 6
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Feb 2019 01:52:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Fredy Wijaya, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12248

to look at the new patch set (#6).

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................

IMPALA-8095: Detailed expression cardinality tests

Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.

This patch adds detailed unit tests for cardinality:

* Table cardinality, NDV values and null count in metadata retrieved from
  HMS.
* Table cardinality, NDV values and null counts in metadata presented to
  the query.
* Expression NDV and selectivity values (which derive from table
  cardinality and column NDV.)

The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.

Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.

This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previous SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.

As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.

Testing: This patch includes only tests: no "production" code was
changed.

Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
D fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/common/AbstractFrontendTest.java
A fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
A fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
18 files changed, 1,706 insertions(+), 536 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/12248/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 6
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jan 2019 00:39:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@356
PS3, Line 356:   private void verifyInequalitySel(String table, String col, String value) throws ImpalaException {
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@462
PS3, Line 462:     verifySelectExpr("alltypes", "int_col in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)", 3, 1);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@490
PS3, Line 490:     verifySelectExpr("alltypes", "int_col not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)", 3, 0);
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
File fe/src/test/java/org/apache/impala/common/FrontendFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/common/FrontendFixture.java@341
PS3, Line 341:     return ctx.analyzeAndAuthorize(parsedStmt, stmtTableCache, frontend_.getAuthzChecker());
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@106
PS3, Line 106:     verifyCardinality("SELECT null_int FROM functional.nullrows WHERE group_str = 'x'", 4);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/3/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@110
PS3, Line 110:     //verifyCardinality("SELECT null_int FROM functional.nullrows WHERE null_str = 'x'", 26);
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 20:48:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 5:

(4 comments)

Thanks for the review Tim. Addressed review comments.

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@168
PS4, Line 168:     // Stats, no null values
> nit: blank line at function start doesn't help readability
Done


http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@162
PS4, Line 162: 
> Doesn't Impala set it with compute stats after IMPALA-7659 was merged?
Thanks. This was a batch of tests from way back I'm merging. The comment was obsolete.


http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
File fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java@45
PS4, Line 45:   private final FrontendFixture feFixture_ = FrontendFixture.instance();
> private? I didn't see a case where this was accessed from a different class
Done


http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

http://gerrit.cloudera.org:8080/#/c/12248/4/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@62
PS4, Line 62:   protected static ImpaladTestCatalog catalog_ = feFixture_.catalog();
> Delete commented-out code?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jan 2019 00:38:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 4:

Fixed check style issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 22:24:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 2:

Rebased on latest master. A catalog_.close() line was added to the CleanUp method in FrontendTestBase. Moved this line to CleanUp in the new FrontendFixture class.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 20:49:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12248

to look at the new patch set (#4).

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................

IMPALA-8095: Detailed expression cardinality tests

Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.

Until this patch, no detailed unit tests exist for this topic. Tests
here include:

* Table cardinality, NDV values and null count in metadata retrieved from
  HMS.
* Table cardinality, NDV values and null counts in metadata presented to
  the query.
* Expression NDV and selectivity values (which derive from table
  cardinality and column NDV.)

The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.

Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.

This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previoius SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.

As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.

Testing: This patch includes only tests: no "production" code was
changed.

Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
D fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/common/AbstractFrontendTest.java
A fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
A fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
18 files changed, 1,688 insertions(+), 535 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/12248/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 7:

Passed pre-review tests: https://jenkins.impala.io/job/pre-review-test/297/


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Feb 2019 07:51:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3744/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 8
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Feb 2019 22:57:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@356
PS2, Line 356:   private void verifyInequalitySel(String table, String col, String value) throws ImpalaException {
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@462
PS2, Line 462:     verifySelectExpr("alltypes", "int_col in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)", 3, 1);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@490
PS2, Line 490:     verifySelectExpr("alltypes", "int_col not in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)", 3, 0);
line too long (97 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
File fe/src/test/java/org/apache/impala/common/FrontendFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/common/FrontendFixture.java@340
PS2, Line 340:     return ctx.analyzeAndAuthorize(parsedStmt, stmtTableCache, frontend_.getAuthzChecker());
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@106
PS2, Line 106:     verifyCardinality("SELECT null_int FROM functional.nullrows WHERE group_str = 'x'", 4);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12248/2/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java@110
PS2, Line 110:     //verifyCardinality("SELECT null_int FROM functional.nullrows WHERE null_str = 'x'", 26);
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Jan 2019 06:59:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1840/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Jan 2019 07:34:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Fredy Wijaya, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12248

to look at the new patch set (#5).

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................

IMPALA-8095: Detailed expression cardinality tests

Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.

Until this patch, no detailed unit tests exist for this topic. Tests
here include:

* Table cardinality, NDV values and null count in metadata retrieved from
  HMS.
* Table cardinality, NDV values and null counts in metadata presented to
  the query.
* Expression NDV and selectivity values (which derive from table
  cardinality and column NDV.)

The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.

Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.

This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previoius SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.

As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.

Testing: This patch includes only tests: no "production" code was
changed.

Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
---
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
D fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/common/AbstractFrontendTest.java
A fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
A fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
18 files changed, 1,689 insertions(+), 536 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/12248/5
-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1871/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jan 2019 01:40:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12248/7/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/7/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@122
PS7, Line 122:    // Bug: NDV of partition columns is -1 though it is listed as
             :     // 2 in the shell with: SHOW COLUMN STATS alltypes
             :     //verifyTableCol(allTypes, "year", 2, 0);
             :     // Bug: When tests are run in Eclipse we get the result above.
             :     // But, when the same test is run using maven from the command line,
             :     // we get the result shown below.
             :     // Unit test in Eclipse see the above, unit tests run from the
             :     // Disabling both to avoid a flaky test,
             :     // Same issue for the next three tests.
             :     //verifyTableCol(allTypes, "year", -1, -1);
             :     //verifyTableCol(allTypes, "month", 12, 0);
             :     //verifyTableCol(allTypes, "month", -1, -1);
> (This and multiple places below)
We should. We should also dig into the many bugs here.

That said, the purpose of this patch is to create the test baseline, pointing out the many things to fix, including this one.

OK to create the baseline first, then do the fixes as separate patches?



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Feb 2019 22:30:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................


Patch Set 5:

(5 comments)

Just some clarifying questions. Overall makes sense to me. Since this is test-only code, we can merge this and keep improving it as and when we find new bugs or need more fixture specific improvements for writing new tests.

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@41
PS5, Line 41: See
            :  * also {@link ExprNdvTest}, {@link CardinalityTest}.
There seems to be overlap in all of these. Should we consolidate them? Others might find it confusing while figuring out where to add new tests.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@110
PS5, Line 110: Bug
Does it make sense to tag the jiras for all the bugs here? I know you raised a bunch of them, probably we should maintain  an epic (cardinality mis-estimates or something) and link them there.

Many of them make good fe ramp-up tasks :-)

** I know that is a lot of work, don't want that to block this patch, just some thought.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@111
PS5, Line 111: // 2 in the shell with SHOW COLUMN STATS alltypes
             :     //verifyTableCol(allTypes, "year", 2, 0);
             :     // Bug: Unit test in Eclipse see the above, unit tests run from the
             :     // command line see the below. Disabling to avoid a flaky test,
             :     // here and below
Not super clear what is happening here, clarify may be?


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@164
PS5, Line 164: After IMPALA-7659, Impala computes a null count,
             :    * when gathering stats, but the NDV does not include nulls (except for Boolean
             :    * columns) if stats are computed by IMpala, but does include nulls if stats are
             :    * computed by Hive.
Haha, this is indeed bizarre.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
File fe/src/test/java/org/apache/impala/common/FrontendFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java@164
PS5, Line 164:  protected void clearTestDbs() {
             :     for (Db testDb: testDbs_) {
             :       catalog_.removeDb(testDb.getName());
             :     }
             :   }
I did something similar in the testcase patch, except that this is backed by a temporary HMS created from scratch and we totally discard the HMS instance after test. That way we don't need to fake all the HMS table structures like in the methods below.



-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:25:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
......................................................................

IMPALA-8095: Detailed expression cardinality tests

Cardinality is a critical input to the query planning process,
especially join planning. Impala has many high-level end-to-end tests
that implicitly test cardinality at the "wholesale" level: A test will
produce a wrong result if the cardinality is badly wrong.

This patch adds detailed unit tests for cardinality:

* Table cardinality, NDV values and null count in metadata retrieved from
  HMS.
* Table cardinality, NDV values and null counts in metadata presented to
  the query.
* Expression NDV and selectivity values (which derive from table
  cardinality and column NDV.)

The test illustrate a number of bugs. This patch simply identifies the
bugs, comments out the tests that fail because of the bugs, and
substitutes tests that pass with the current, incorrect, behavior.
Future patches will fix the bugs. Reviewers can note the difference
between the original, incorrect behavior shown here, and the revised
behavior in those additional patches.

Since none of the existing "functional" tables provide the level of
detail needed for these tests, added a new test table specifically for
this task.

This set of tests was a good time to extend the test "fixture" framework
created earlier. The FrontendTestBase class was refactored to use a new
FrontendFixture which represents a (simulated) Impala and HMS cluster.
The previous SessionFixture represents a single user session (with
session options) and the QueryFixture represents a single query.

As part of this refactoring, the fixture classes moved into "common"
alongside FrontendTestBase.

Testing: This patch includes only tests: no "production" code was
changed.

Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Reviewed-on: http://gerrit.cloudera.org:8080/12248
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
D fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
A fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/common/AbstractFrontendTest.java
A fe/src/test/java/org/apache/impala/common/AnalysisSessionFixture.java
A fe/src/test/java/org/apache/impala/common/FrontendFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
A fe/src/test/java/org/apache/impala/common/QueryFixture.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
17 files changed, 1,704 insertions(+), 533 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 9
Gerrit-Owner: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>