You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2018/10/02 20:52:32 UTC

[Impala-ASF-CR] Optimize expression to collect NULLs count

piotr.findeisen@gmail.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11565


Change subject: Optimize expression to collect NULLs count
......................................................................

Optimize expression to collect NULLs count

Simpler expression should be OK to count unconditionally.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
1 file changed, 2 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: piotr.findeisen@gmail.com

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#7) to the change originally created by piotr.findeisen@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests: Updated the affected tests to include the null counts.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
13 files changed, 1,289 insertions(+), 1,304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
Can we add a couple of tests that verify that cardinality estimates for output of "IS NULL" and "IS NOT NULL" are correct?

It would also be good to do a sanity check of the compute stats performance change on a largish table. I'd expect the difference will be small but we should confirm.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Dec 2018 18:11:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
piotr.findeisen@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Simplify expression to collect NULLs count
......................................................................


Patch Set 1:

> I can reproduce the difference you're seeing - the aggregation *is* dramatically slower.

I am not sure which aggregation you mean -- technically, `COUNT(*)`, `COUNT(symbol)`, `COUNT(IF(...))` are all aggregations.

Do you mean that my proposed version turned out to be slower in your testing?


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 07:33:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6:

Reviewers, please hold off until I confirm that the core/exhaustive builds are green. If not, I'll probably have to patch a few more tests in next PS.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Nov 2018 22:38:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3271/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:46:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1511/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Dec 2018 17:49:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#6) to the change originally created by piotr.findeisen@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests: Updated the affected tests to include the null counts.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
13 files changed, 1,291 insertions(+), 1,306 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests:
=====

- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
  estimation.

Perf note:
=========

I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Reviewed-on: http://gerrit.cloudera.org:8080/11565
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
14 files changed, 1,295 insertions(+), 1,304 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

(2 comments)

A few comments.

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG@10
PS5, Line 10: IMPALA-7655.
This fix would be helpful for IMPALA-7310: if NDV=0, we need to know if nullCount > 0 to know if we should adjust NDV upward by one. (The planner includes nulls in NDV, statistics don't.)


http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS5, Line 251:           " IS NULL THEN 1 ELSE NULL END)");
If this expression is faster than an if() function (IMPALA-7655), then does it makes sense to optimize the if() function in the planner so it benefits all queries rather than just changing this one?



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 22:25:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#8) to the change originally created by piotr.findeisen@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests: Updated the affected tests to include the null counts.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
14 files changed, 1,295 insertions(+), 1,304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 8
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6:

(2 comments)

I fixed the tests based on the failures I noticed in a core run. I'll trigger another test run to see if it comes out green. Meanwhile, I'm also running an exhaustive run so that there are no surprises later.

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG@10
PS5, Line 10: re-enables it since a lot of codegen improvements have happened since
> This fix would be helpful for IMPALA-7310: if NDV=0, we need to know if nul
Agreed.


http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS5, Line 251:           " IS NULL THEN 1 ELSE NULL END)");
> If this expression is faster than an if() function (IMPALA-7655), then does
Agree with you. I think it's better to do it as a separate change.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Nov 2018 22:31:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3271/


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Fri, 05 Oct 2018 02:29:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 3:

Yeah, I included an example in IMPALA-7655


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 3
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 19:35:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 11: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Sat, 08 Dec 2018 00:21:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Simplify expression to collect NULLs count
......................................................................


Patch Set 2:

It sounds like you did some performance experiment - what were the two queries you ran and how fast did they run? I'm curious...


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 2
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 00:50:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11565

to look at the new patch set (#4).

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................

IMPALA-7659: Simplify expression to collect NULLs count

For reasoning behind replacing COUNT(IF(...)) with COUNT(CASE...) see
IMPALA-7655.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
1 file changed, 2 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 4
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com

[Impala-ASF-CR] Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Simplify expression to collect NULLs count
......................................................................


Patch Set 1:

I can reproduce the difference you're seeing - the aggregation *is* dramatically slower. I think this isn't expected though, we should be compiling the first version to something efficient - I'd expect that to ultimately be more efficient if we get it right since it's only a single aggregate function.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 00:58:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

The change in expression makes sense to me. I think we need someone who knows the planner better to evaluate whether we need more testing here. I suspect that tests will need modification to reflect the behaviour change (otherwise we have a testing gap).


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:46:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#9) to the change originally created by piotr.findeisen@gmail.com. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................

IMPALA-7659: Populate NULL count while computing column stats

It was disabled for performance reasons (IMPALA-1003) and this patch
re-enables it since a lot of codegen improvements have happened since
then.

This patch switches the aggregation to use the CASE conditional instead
of IF since the former has proper codegen support (IMPALA-7655).

Tests:
=====

- Updated the affected tests to include the null counts.
- Added unit tests that verify IS [NOT] NULL predicates' cardinality
  estimation.

Perf note:
=========

I reran the compute stats child query with null counts included on the
store_sales table from 1000 SF (1TB) tpcds dataset. The table had 22
non-partitioned columns (on which null counts were computed) and ~2.8B
rows. This experiment showed around 7-8% perf drop compared to the same
child query without null counts for these columns.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-decimal.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats-keywords.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
14 files changed, 1,295 insertions(+), 1,304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/9
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/941/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 3
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:03:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/943/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:26:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Thanks, Piotr for getting back. I'll take this one forward and submit a new PS that addresses the comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Nov 2018 00:26:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Sat, 01 Dec 2018 02:22:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
piotr.findeisen@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Thanks Tim for your help


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 19:51:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3539/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Dec 2018 20:36:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6:

(1 comment)

Any reviews on this one? Planning to get this one in before others keep updating these tests.

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:           " IS NULL THEN 1 ELSE NULL END)");
> I thought you were doing it in a separate change? https://gerrit.cloudera.o
Was discussing this with Paul offline. We thought that adjusting the NDV before writing the stats to HMS is problematic since it might affect other SQL engines (like Presto/Hive etc). We don't know if they expect the null count to be adjusted already.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Dec 2018 16:32:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 9: Code-Review+1

LGTM


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Dec 2018 00:14:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251: 
> Agreed. Let's get this in, then tackle the NDV=0 as a separate issue.
IMPALA-1003 doesn't include many breadcrumbs. I do wonder if there were other, more severe, issues back when it was disabled - e.g. codegen was disabled for the whole aggregation or something.

I just quickly tried an experiment measuring the time added to the aggregation for each additional aggregate function of this nature and the delta from each added function is very small.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Dec 2018 01:43:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251:           " IS NULL THEN 1 ELSE NULL END)");
If we compute the null count, can we also solve the NDV=0 issue? Before saving the NDV, add one if the null count is greater than zero. This will ensure that the NDV count here matches the planner's expectation of NDV-including-nulls.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Sat, 01 Dec 2018 00:29:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 9: Code-Review+2

Carrying +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Dec 2018 20:36:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11565

to look at the new patch set (#3).

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................

IMPALA-7659: Simplify expression to collect NULLs count

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
1 file changed, 2 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 3
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/942/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 4
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:18:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1484/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Nov 2018 23:02:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3538/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Dec 2018 06:50:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11565

to look at the new patch set (#5).

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................

IMPALA-7659: Simplify expression to collect NULLs count

For reasoning behind replacing COUNT(IF(...)) with COUNT(CASE...) see
IMPALA-7655.

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
1 file changed, 3 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7:

(1 comment)

Rebased. Core and exhaustive tests passed. PS7 is ready for review.

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251: 
> If we compute the null count, can we also solve the NDV=0 issue? Before sav
I thought you were doing it in a separate change? https://gerrit.cloudera.org/#/c/11528/

I think it is nice to do it as a separate change so that it is easy to revert if something blows up.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Mon, 03 Dec 2018 17:16:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3509/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 6
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Nov 2018 22:31:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Optimize expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
piotr.findeisen@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Optimize expression to collect NULLs count
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@250
PS1, Line 250:       columnStatsSelectList.add("COUNT(*) - COUNT(" + colRefSql + ")");
> It's not obvious to me that this should be faster after we do code generati
I haven't. I naively assume there must be something to do with COUNT over IF expression that caused slow down. Otherwise I couldn't understand why would counting NULLs be ever slower than NDV or SAMPLED_NDV used for counting distinct values.

I don't have an Impala installation that I could use for experiments. For now, I will just remove possibly misleading suggestions from the commit message. Would you be able to do some experiment with this?



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Wed, 03 Oct 2018 09:20:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 9:

Tim/Paul: Mind taking another quick look at the updated test and commit message? Thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Dec 2018 21:24:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

I'll start a dry run just to see what tests fail.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:46:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Paul Rogers (Code Review)" <ge...@cloudera.org>.
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7: Code-Review+1

(2 comments)

My vote is to get this in, then do three things:

1. Use IMPALA-7842 to add cardinality tests based on this feature.
2. Do some refresh metadata performance runs to check performance impact.
3. Tackle the NDV-does-or-does-not-include-nulls issue.

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
> Can we add a couple of tests that verify that cardinality estimates for out
FWIW, IMPALA-7842 provides a starter set of cardinality tests based on exposing the pre-Thrift plan tree. We can build on those if that patch goes in before this one.


http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251: 
> Was discussing this with Paul offline. We thought that adjusting the NDV be
Agreed. Let's get this in, then tackle the NDV=0 as a separate issue.

I wonder, do we have any data about the original issue: any performance slowness when adding this additional calculation? If a table has many columns, and we add a null count for each, how much impact is there on refresh metadata performance?



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Dec 2018 21:18:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7:

It'd still be good to do a sanity check for performance, just on your dev machine, if possible, but that shouldn't block getting this in.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Dec 2018 01:45:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Simplify expression to collect NULLs count
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/2//COMMIT_MSG@7
PS2, Line 7: Simplify expression to collect NULLs count
Any chance you could file a JIRA to track the work and mention it in the commit message? https://issues.apache.org/jira/secure/Dashboard.jspa

It would be interesting to understand how you stumbled on this since it seems like your motivation is to emit the null count as part of stats.



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 2
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 18:45:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11565

to look at the new patch set (#2).

Change subject: Simplify expression to collect NULLs count
......................................................................

Simplify expression to collect NULLs count

Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
---
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
1 file changed, 2 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/11565/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 2
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com

[Impala-ASF-CR] Optimize expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Optimize expression to collect NULLs count
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@250
PS1, Line 250:       columnStatsSelectList.add("COUNT(*) - COUNT(" + colRefSql + ")");
It's not obvious to me that this should be faster after we do code generation since the generated code for the case statement should be basically the same as for counting non-null values except with the condition inverted. Have you done any experiments?



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Oct 2018 20:59:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] Optimize expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Optimize expression to collect NULLs count
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/906/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 1
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Oct 2018 21:24:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
piotr.findeisen@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Hi Bharath,
sorry for being stucked on this. It will be probably better if you take it from here.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Sun, 18 Nov 2018 21:58:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11565/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/4/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@250
PS4, Line 250:       columnStatsSelectList.add("COUNT(CASE WHEN " + colRefSql + " IS NULL THEN 1 ELSE NULL END)");
line too long (99 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 4
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 19:47:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
piotr.findeisen@gmail.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 3:

> If you rewrite with a case expression instead of if you get
 > something thats much faster again.

You something like:
COUNT(CASE WHEN symbol IS NULL THEN NULL ELSE 1 END)
?


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 3
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 19:30:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Looks like a bunch of tests failed because of the difference in stats: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3310/#showFailuresLink


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Mon, 08 Oct 2018 21:26:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Populate NULL count while computing column stats

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Populate NULL count while computing column stats
......................................................................


Patch Set 7: Code-Review-1

(2 comments)

Tim, you raised perfectly valid concerns. I'll wait for https://gerrit.cloudera.org/#/c/11920/ to be merged since it adds the necessary unit-test infrastructure.  Please find the perf test result in the comment. (Parking it as -1 while the other GVO is running).

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/7//COMMIT_MSG@16
PS7, Line 16: Tests: Updated the affected tests to include the null counts.
> FWIW, IMPALA-7842 provides a starter set of cardinality tests based on expo
Tim, I agree we need a unit test for this. Thanks for pointing it out. I'll wait for IMPALA-7842 to go in, rebase on top of it and add it there since it provides the necessary plumbing.


http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/6/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS6, Line 251: 
> IMPALA-1003 doesn't include many breadcrumbs. I do wonder if there were oth
I tried this on the store_sales table from 1TB tpcds/parquet dataset. The table has 22 non-partitioned columns and ~2.8B rows. 

I ran the child query with and without null count and I noticed 7-8% slowdown. (I warmed up the cluster by running the queries until their runtime stabilized). ~64s without null count / ~70s with null count.

I don't see why "refresh" would be affected? (unless I'm missing something)



-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 7
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Dec 2018 04:03:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] Simplify expression to collect NULLs count

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: Simplify expression to collect NULLs count
......................................................................


Patch Set 2:

Your proposed version is faster than the current version, but that's because of a limitation in the if() function support: https://issues.apache.org/jira/browse/IMPALA-7655

If you rewrite with a case expression instead of if you get something thats much faster again.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 2
Gerrit-Owner: piotr.findeisen@gmail.com
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: piotr.findeisen@gmail.com
Gerrit-Comment-Date: Thu, 04 Oct 2018 18:28:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
......................................................................


Patch Set 5:

Hi Piotr, Any updates? If you are busy and don't have time to look into this, please let us know and one of us can pick it up. Thanks.


-- 
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Anonymous Coward <pi...@gmail.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pa...@yahoo.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <ve...@cloudera.com>
Gerrit-Comment-Date: Sun, 11 Nov 2018 03:00:26 +0000
Gerrit-HasComments: No