You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Pooja Nilangekar (Code Review)" <ge...@cloudera.org> on 2018/11/02 00:18:00 UTC

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Pooja Nilangekar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11854


Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................

IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Previously, the AggregationNode calculated the estimated number
of rows based on input cardinality without accounting for the
division of input data across multiple fragment instances. This
bloated up the memory estimates for the node. After this change,
the AggregationNode to accounts for the number of fragment
instances while estimating the number of rows per instance. A skew
factor of 1.5 was added to account for data skew among multiple
fragment instances. This number was derived using empirical
analysis of real-world and benchmark (tpch, tpcds) queries.

Testing:
Tested queries with changed estimates to avoid cases of
significant underestimation of memory.
Ran front-end and end-to-end tests affected by this change.

Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
7 files changed, 97 insertions(+), 82 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/11854/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 1
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3437/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Nov 2018 17:48:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Nov 2018 17:48:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Pooja Nilangekar (Code Review)" <ge...@cloudera.org>.
Pooja Nilangekar has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 2:

> Patch Set 2:
> 
> have you filed a jira for changing behavior between grouping and non-grouping aggs?

Yes.


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Nov 2018 00:17:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Nov 2018 22:00:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................

IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Previously, the AggregationNode calculated the estimated number
of rows based on input cardinality without accounting for the
division of input data across multiple fragment instances. This
bloated up the memory estimates for the node. After this change,
the AggregationNode accounts for the number of fragment instances
while estimating the number of rows per instance. A skew factor of
1.5 was added to account for data skew among multiple fragment
instances. This number was derived using empirical analysis of
real-world and benchmark (tpch, tpcds) queries.

Testing:
Tested queries with changed estimates to avoid cases of
significant underestimation of memory.
Ran front-end and end-to-end tests affected by this change.

Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Reviewed-on: http://gerrit.cloudera.org:8080/11854
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
7 files changed, 97 insertions(+), 82 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 4
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11854/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11854/1//COMMIT_MSG@13
PS1, Line 13:  to
nit: extra word


http://gerrit.cloudera.org:8080/#/c/11854/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/11854/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@474
PS1, Line 474: perInstanceCardinality == -1
does this (getPerInstanceNdv returning -1) mean that we dont have the necessary data to cap the estimate based on input cardinality?



-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 1
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 00:49:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 23:00:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1259/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 1
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 00:55:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Nov 2018 01:37:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1264/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 18:26:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Pooja Nilangekar (Code Review)" <ge...@cloudera.org>.
Pooja Nilangekar has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................

IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Previously, the AggregationNode calculated the estimated number
of rows based on input cardinality without accounting for the
division of input data across multiple fragment instances. This
bloated up the memory estimates for the node. After this change,
the AggregationNode accounts for the number of fragment instances
while estimating the number of rows per instance. A skew factor of
1.5 was added to account for data skew among multiple fragment
instances. This number was derived using empirical analysis of
real-world and benchmark (tpch, tpcds) queries.

Testing:
Tested queries with changed estimates to avoid cases of
significant underestimation of memory.
Ran front-end and end-to-end tests affected by this change.

Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
7 files changed, 97 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/11854/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-7791: Compute AggregationNode's estimated rows using # instances

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11854 )

Change subject: IMPALA-7791: Compute AggregationNode's estimated rows using # instances
......................................................................


Patch Set 2:

have you filed a jira for changing behavior between grouping and non-grouping aggs?


-- 
To view, visit http://gerrit.cloudera.org:8080/11854
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cb9746fafa3e5952e28caa952837e285bcc22ac
Gerrit-Change-Number: 11854
Gerrit-PatchSet: 2
Gerrit-Owner: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 23:01:38 +0000
Gerrit-HasComments: No