You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Aman Sinha (Code Review)" <ge...@cloudera.org> on 2019/11/16 20:11:42 UTC

[Impala-ASF-CR] IMPALA-9146: Add a configurable limit for the size of broadcast input.

Aman Sinha has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/14690 )

Change subject: IMPALA-9146: Add a configurable limit for the size of broadcast input.
......................................................................

IMPALA-9146: Add a configurable limit for the size of broadcast input.

Impala's DistributedPlanner may sometimes accidentally choose broadcast distribution
for inputs that are larger than the destination executor's total memory. This could
potentially happen if the cluster membership is not accurately known and the planner's
cost computation of the broadcastCost vs partitionCost happens to favor the broadcast
distribution. This causes spilling and severely affects performance. Although the
DistributedPlanner does a mem_limit check before picking broadcast, the mem_limit is
not an accurate reflection since it is assigned during admission control.

As a safety here we introduce an explicit configurable limit: broadcast_bytes_limit
for the size of the broadcast input and set it to default of 32GB. The default is chosen
based on analysis of existing benchmark queries and representative workloads. If the
estimated input size on the build side is greater than this threshold,
the DistributedPlanner will fall back to a partition distribution.

Testing:
 - Ran all regression tests on Jenkins successfully
 - Added a new unit test in PlannerTest that sets the
broadcast_bytes_limit to a small value and checks whether the
distributed plan does hash partitioning on the build side instead of
broadcast.

Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
A testdata/workloads/functional-planner/queries/PlannerTest/broadcast-bytes-limit.test
9 files changed, 75 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/14690/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe5639ca38acb72e0194aa80bc6ebb6cafb2acd9
Gerrit-Change-Number: 14690
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha <am...@cloudera.com>