You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Riza Suminto (Code Review)" <ge...@cloudera.org> on 2021/03/09 20:15:44 UTC

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17166


Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................

IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit

IMPALA-9856 enables result spooling by default. However, if the query
option scratch_limit is set lower than max_spilled_result_spooling_mem,
the query might fail in the middle of execution due to insufficient
scratch space. This patch validation that when result spooling is
enabled, max_spilled_result_spooling_mem <= scratch_limit.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit
  and test_with_low_scratch_limit.
- Toggle off spool_query_results in
  test_with_zero_scratch_limit_no_memory_limit to ensure that spill will
  not happen.
- Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem.
- Add be test QueryOptions.ResultSpoolingWithScratchLimit.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M tests/query_test/test_scratch_limit.py
3 files changed, 120 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 7: Verified+1 Code-Review+2

Carry prior +2 for code review and +1 for gerrit-verify-dryrun.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:35:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#7).

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 3:

Following up on private discussion, we decide that it is better to override result spooling configuration if scratch_limit is too low.

Patch set 3 implement this strategy. I will rerun an exhaustive tests over this patch set later tonight.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 00:00:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

> Patch Set 6:
> 
> > Patch Set 6: Verified-1
> > 
> > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
> 
> Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559).

Yeah, that is one failure..but I also see for the dockerized tests bunch of other failures: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it seems the failures are all related to lost connection.
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
E    INNER EXCEPTION: <class 'thrift.transport.TTransport.TTransportException'>
E    MESSAGE: TSocket read 0 bytes


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 16:58:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8355/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 04:17:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 12:46:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................


Patch Set 2:

(2 comments)

Patch set 2 disable result spooling in tests that set scratch_limit > -1.

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1111
PS1, Line 1111: un
> Agree. I will rephrase into this:
Done


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != -1) {
> Good point. Quick git grep show some more places where scratch_limit is set
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 00:53:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Reviewed-on: http://gerrit.cloudera.org:8080/17166
Reviewed-by: Aman Sinha <am...@cloudera.com>
Tested-by: Aman Sinha <am...@cloudera.com>
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)

Approvals:
  Aman Sinha: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120: 
> ACK.
Done


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:           queryOptions.setSpool_query_results(false);
> Yeah, that will make the code cleaner.  Agree about the first point about u
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 03:59:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#6).

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:53:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 5: Code-Review+1

(2 comments)

Couple of nits. Rest of it LGTM.

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123
PS5, Line 123:      // If scratch_limit < maxAllowedScratchLimit,
nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ?


http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125
PS5, Line 125: If scratch_limit < maxAllowedScratchLimit
nit: Similar update needed here ?



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:39:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

> Patch Set 6:
> 
> > Patch Set 6:
> > 
> > > Patch Set 6:
> > > 
> > > > Patch Set 6: Verified-1
> > > > 
> > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
> > > 
> > > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559).
> > 
> > Yeah, that is one failure..but I also see for the dockerized tests bunch of other failures: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it seems the failures are all related to lost connection.
> > E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> > E    INNER EXCEPTION: <class 'thrift.transport.TTransport.TTransportException'>
> > E    MESSAGE: TSocket read 0 bytes
> 
> TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad due to DCHECK violation in reservation-tracker.cc:436. It is a violation when BufferedTupleStream is spilling and subsequently allocating new page for read.
> 
> It is unrelated with this patch. But the combination of sort query + result spooling in this test seems to reveal another bug in BufferedTupleStream.
> At this point, I think we should disable result spooling for all test in TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it separately. What do you think?

Yes, let's do that. I presume you will explicitly enable result_spooling only for the following test? 
def test_result_spooling_and_varying_scratch_limit(self, vector):


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:30:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 7:

Patch set 7 is a rebase of patch set 6.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:32:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104
PS3, Line 1104:   // (a value of 0 or -1 means memory is unbounded).
I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query option is accepted as indicator for infinite memory and translated into 0 return value. So testing against -1 here is not necessary. I will revert this change.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:       long maxMemReservationBytes = Math.max(
             :           queryOptions.getMax_result_spooling_mem(), minMemReservationBytes);
This maxMemReservationBytes does not seems to take account of max_result_spooling_mem = 0 (unbounded), which can peg maxMemReservationBytes = minMemReservationBytes. Im not sure what to set maxMemReservationBytes in this case.

Similarly, SpillableRowBatchQueue does not seem to recognize max_spilled_result_spooling_mem = 0 as unbounded.



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 05:00:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120: scratch_limit < minMemReservationBytes
> Update this and the one below to account for the extra maxRowBufferSize
ACK.


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:           maxMemReservationBytes = scratchLimit - maxRowBufferSize;
> For this adjustment for maxRowBufferSize, can we not just do it up front (o
If scratch_limit is unbounded, the maxMemReservationBytes calculation in line 114 is OK. Little overspill will not fail the query.
In contrary, if scratch_limit is bounded, just a little overspill will terminate the query because scratch_limit is strictly enforced.

What if I tidy up the comparison a bit so that it looks simpler? We define

  long maxAllowedScratchLimit = scratchLimit - maxRowBufferSize;

Instead of comparing against scratchLimit, these should compare against maxAllowedScratchLimit;



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 01:47:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#5).

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 143 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8334/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 01:10:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1111
PS1, Line 1111: $1
> slightly misleading,does not necessarily have to be the same, could be 0 or
Agree. I will rephrase into this:
"If max_result_spooling_mem is set to unbounded ($0) max_spilled_result_spooling_mem must be set to unbounded (0 or -1) as well."


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != -1) {
> since now we have spilling on by default, spool_query_results is set to 1GB
Good point. Quick git grep show some more places where scratch_limit is set to 0, like some in spilling-naaj-no-deny-reservation.test. I will also set spool_query_results=0 in these places.



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 23:28:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77
PS3, Line 77:    * SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a min/max resevation,
Some of the method level comment should be updated to reflect the behavior now that an empty resource profile is created in some situations when spool_query_results is true.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92
PS3, Line 92: wither
nit: typo ?


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:       long maxMemReservationBytes = Math.max(
             :           queryOptions.getMax_result_spooling_mem(), minMemReservationBytes);
> This maxMemReservationBytes does not seems to take account of max_result_sp
It sounds like an existing bug.  If you can create a test case for it can you file a separate JIRA ?


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@122
PS3, Line 122:       if (scratchLimit > -1) {
Should this check be scratchLimit > 0 since -1 or 0 mean unbounded right ?


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126: increase
nit: 'increasing'


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126: to
Suggest rewording:  'to >=' minMemReservationBytes 
(since one can bump it up to a much higher value)


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS3, Line 142:           queryOptions.setMax_spilled_result_spooling_mem(scratchLimit);
Would be useful to add a trace level log message here as well.


http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test:

http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2
PS3, Line 2: ---- QUERY
Could you add 1 tests with empty scratch dirs ?



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 07:58:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 4:

(8 comments)

My exhaustive test run last night reveal that scratch_limit might still get violated if maxMemReservationBytes is equal to scratch_limit. This is because the content of SpillableRowBatchQueue can be slightly higher than maxMemReservationBytes when it decide to spill.

To anticipate that, I lower the spooling mem config a little further here in Patch Set 4.

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104
PS3, Line 1104:   // max_spilled_result_spooling_mem (a value of 0 means memory is unbounded).
> I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query op
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77
PS3, Line 77:    * If SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a min/max resevation,
> Some of the method level comment should be updated to reflect the behavior 
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92
PS3, Line 92: 
> nit: typo ?
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110
PS3, Line 110:       long bufferSize = queryOptions.getDefault_spillable_buffer_size();
             :       long maxRowBufferSize = PlanNode.computeMaxSpillableBufferSize(
> It sounds like an existing bug.  If you can create a test case for it can y
I filed IMPALA-10583. Will work on that next.


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126: 
> Suggest rewording:  'to >=' minMemReservationBytes 
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126
PS3, Line 126: 
> nit: 'increasing'
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS3, Line 142:           maxMemReservationBytes = scratchLimit - maxRowBufferSize;
> Would be useful to add a trace level log message here as well.
Done


http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test:

http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2
PS3, Line 2: ---- QUERY
> Could you add 1 tests with empty scratch dirs ?
Since scratch_dirs is a backend flag, I piggy back the test under TestScratchDir::test_no_dirs



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 20:00:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120
PS4, Line 120: scratch_limit < minMemReservationBytes
Update this and the one below to account for the extra maxRowBufferSize


http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:           maxMemReservationBytes = scratchLimit - maxRowBufferSize;
For this adjustment for maxRowBufferSize, can we not just do it up front (on line 114) since we know that maxMemReservationBytes should always be conservative such that it leaves a cushion for maxRowBufferSize. It should simplify the logic and presumably not cause other side effects (unless I am missing something).



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 01:14:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#2).

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................

IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit

IMPALA-9856 enables result spooling by default. However, if the query
option scratch_limit is set lower than max_spilled_result_spooling_mem,
the query might fail in the middle of execution due to insufficient
scratch space. This patch validation that when result spooling is
enabled, max_spilled_result_spooling_mem <= scratch_limit.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit
  and test_with_low_scratch_limit.
- Toggle off spool_query_results in tests that set scratch_limit=0 to
  ensure that result spilling will not happen.
- Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem.
- Add be test QueryOptions.ResultSpoolingWithScratchLimit.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj-no-deny-reservation.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test
M testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test
M tests/query_test/test_scratch_limit.py
7 files changed, 129 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8344/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 00:13:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142
PS4, Line 142:           maxMemReservationBytes = scratchLimit - maxRowBufferSize;
> If scratch_limit is unbounded, the maxMemReservationBytes calculation in li
Yeah, that will make the code cleaner.  Agree about the first point about unbounded scratch_limit.



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 02:05:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6966/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 18:02:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#4).

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Verify that spool_query_results query option is disabled in
  TestScratchDir::test_no_dirs
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
8 files changed, 140 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8323/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Mar 2021 20:36:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

> Patch Set 6:
> 
> > Patch Set 6:
> > 
> > > Patch Set 6: Verified-1
> > > 
> > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/
> > 
> > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559).
> 
> Yeah, that is one failure..but I also see for the dockerized tests bunch of other failures: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it seems the failures are all related to lost connection.
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    INNER EXCEPTION: <class 'thrift.transport.TTransport.TTransportException'>
> E    MESSAGE: TSocket read 0 bytes

TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad due to DCHECK violation in reservation-tracker.cc:436. It is a violation when BufferedTupleStream is spilling and subsequently allocating new page for read.

It is unrelated with this patch. But the combination of sort query + result spooling in this test seems to reveal another bug in BufferedTupleStream.
At this point, I think we should disable result spooling for all test in TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it separately. What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:10:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123
PS5, Line 123:      // If maxAllowedScratchLimit < minMemReservat
> nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ?
Thanks! Sorry for missing this.


http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125
PS5, Line 125: If maxAllowedScratchLimit < maxMemReserva
> nit: Similar update needed here ?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:44:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1111
PS1, Line 1111: $1
slightly misleading,does not necessarily have to be the same, could be 0 or -1, right?


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != -1) {
since now we have spilling on by default, spool_query_results is set to 1GB, i wonder if this check will break older workloads that set the scratch limit to something and dont expect it to throw an error.



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 21:54:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 23:42:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8357/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 07:04:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#3).

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................

IMPALA-10565: Adjust result spooling memory based on scratch_limit

IMPALA-9856 enables result spooling by default. Result spooling depends
on the ability to spill its entire BufferedTupleStream to disk once it
hits maximum memory reservation. However, if the query option
scratch_limit is set lower than max_spilled_result_spooling_mem, the
query might fail in the middle of execution due to insufficient scratch
space. This patch adds planner change to consider scratch_limit and
scratch_dirs query option when computing resource used by result
spooling. The algorithm is as follow:

* If scratch_dirs is empty or scratch_limit < minMemReservationBytes
  required to use BufferedPlanRootSink, we set spool_query_results to
  false and fallback to use BlockingPlanRootSink.

* If scratch_limit > minMemReservationBytes but still fairly low, we
  lower the max_result_spooling_mem (default is 100MB) and
  max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit.

* if scratch_limit > max_spilled_result_spooling_mem, do nothing.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit
- Remove spool_query_results query option in
  custom_cluster/test_scratch_disk.py
- Change be test QueryOptions.ResultSpooling to also test -1 as
  unbounded.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test
M tests/custom_cluster/test_scratch_disk.py
M tests/query_test/test_scratch_limit.py
9 files changed, 140 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

> Yes, let's do that. I presume you will explicitly enable result_spooling only for the following test? 
> def test_result_spooling_and_varying_scratch_limit(self, vector):

Yes. My plan is to disable result spooling as a separate patch for IMPALA-10559.
This patch then only have result spooling enabled for test_result_spooling_and_varying_scratch_limit.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 17:36:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

> Patch Set 6:
> 
> > Yes, let's do that. I presume you will explicitly enable result_spooling only for the following test? 
> > def test_result_spooling_and_varying_scratch_limit(self, vector):
> 
> Yes. My plan is to disable result spooling as a separate patch for IMPALA-10559.
> This patch then only have result spooling enabled for test_result_spooling_and_varying_scratch_limit.

Will restart a run.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 18:02:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8359/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Mar 2021 03:49:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@122
PS3, Line 122:       if (scratchLimit > -1) {
> Should this check be scratchLimit > 0 since -1 or 0 mean unbounded right ?
Ignore this comment.  I  was probably thinking about the memory setting (side effects of late night review).



-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 15:37:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 06:55:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 6:

> Patch Set 6: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/

Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559).


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 15:49:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8351/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 20:12:32 +0000
Gerrit-HasComments: No