You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sahil Takiar (Code Review)" <ge...@cloudera.org> on 2019/08/08 23:08:48 UTC

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14039


Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran core tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
44 files changed, 994 insertions(+), 129 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#7).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 1,006 insertions(+), 273 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

Not super familiar with the resource profile part of the logic so Tim or Bikram may want to double check.

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc@81
PS8, Line 81:       // The pin should be stream at this point.
            :       DCHECK(batch_queue_->is_pinned());
It makes me wonder if the logic will be simpler if we just always use  an unpinned stream. For the most part, if the result row fits into a single buffer, this is the same as having a pinned stream.



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 22:16:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4345/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 17:29:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#10).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 981 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 7:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/exec/exec-node.h@196
PS7, Line 196:   const std::string label() const;
> I think the const is unnecessary for a return by value (if you're returning
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream-test.cc@323
PS7, Line 323: "-1"
> nit: use the name of the test instead. Same below.
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream.h@370
PS7, Line 370:   int64_t BytesUnpinned() const {
> I'd probably just call it bytes_unpinned() cause it's a plain accessor (the
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h
File be/src/runtime/spillable-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h@71
PS7, Line 71:   /// successfully added, returns an error Status if the queue is full, has already been
> Why not just make it invalid to append to the queue when it's full? We coul
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h@79
PS7, Line 79:   /// Returns and removes the RowBatch at the head of the queue. Returns Status::OK() if
> Same thing - if it's not valid to call GetBatch() on an empty queue, just m
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h@100
PS7, Line 100: BufferedTupleStream* batch_queue_;
> Shouldn't this use unique_ptr ? May be I am missing something but as the co
Yeah, thanks for catching that, it was leaking. Made it a unique_ptr.


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@53
PS7, Line 53: batch_queue_ = new
> Use unique_ptr
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@67
PS7, Line 67:   if (UNLIKELY(closed_)) return Status("SpillableRowBatchQueue is already closed.");
> If it's a bug to call this when it's closed, let's just make it a DCHECK.
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@112
PS7, Line 112:   if (UNLIKELY(closed_)) return false;
> I think it would be a bug to call this when it's closed, so we could docume
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@130
PS7, Line 130:   if (batch_queue_ != nullptr)
> Missing braces for multi-line if
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc@907
PS7, Line 907: query_options->max_pinned_result_spooling_memory 
> nits: code seems more legible if we factor out these two options into local
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc@921
PS7, Line 921:   if (query_options->max_pinned_result_spooling_memory == 0
             :       && query_options->max_unpinned_result_spooling_memory != 0)
> May be I mis-read it but isn't this the same check as the one on line 907 ?
Yeah, not sure what I was thinking here. Removed the extra checks and re-factored the Status messages a bit. Tests all still pass.


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc@928
PS7, Line 928:   if (query_options->max_unpinned_result_spooling_memory != 0
             :       && query_options->max_unpinned_result_spooling_memory
             :           < query_options->max_pinned_result_spooling_memory)
> Same question: isn't this the same as the one at line 912 ?
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift@415
PS7, Line 415:   // The maximum amount of pinned memory used when spooling query results. If this value
> It's probably better to avoid renumbering things in thrift files. I think i
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift@419
PS7, Line 419:   MAX_PINNED_RESULT_SPOOLING_MEMORY = 86
> Also, please specify the behavior when setting this to 0. Same for any valu
Done


http://gerrit.cloudera.org:8080/#/c/14039/7/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/14039/7/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@88
PS7, Line 88:       maxMemReservationBytes = Math.max(maxMemReservationBytes, minMemReservationBytes);
> Move this up to l74 so that the calculation is all in one place?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 21:11:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4834/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 21:37:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 11: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4834/


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 01:47:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4323/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Aug 2019 19:29:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4336/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 20:43:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 12: Code-Review+2

Carrying the +2, not sure how result-spooling.test was passing earlier, but I updated the mem-estimates and they look correct.

Waiting to submit this until https://gerrit.cloudera.org/#/c/14015/ is merged since they conflict.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 16:51:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4349/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 20:31:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/blocking-plan-root-sink.cc
File be/src/exec/blocking-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/blocking-plan-root-sink.cc@39
PS1, Line 39: Status BlockingPlanRootSink::Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/buffered-plan-root-sink.cc@33
PS1, Line 33: Status BufferedPlanRootSink::Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/plan-root-sink.h@62
PS1, Line 62:   virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) override = 0;
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/runtime/spillable-row-batch-queue.cc@52
PS1, Line 52:   batch_queue_ = new BufferedTupleStream(state_, row_desc_, reservation_manager_.buffer_pool_client(),
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/runtime/spillable-row-batch-queue.cc@128
PS1, Line 128:   if (batch_queue_ != nullptr) batch_queue_->Close(nullptr, RowBatch::FlushMode::NO_FLUSH_RESOURCES);
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/14039/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/14039/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@61
PS1, Line 61:    * of the associated fragment's root node. If the cardinality or the average row size are
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 23:09:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream-test.cc@323
PS7, Line 323: "-1"
nit: use the name of the test instead. Same below.


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h
File be/src/runtime/spillable-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h@100
PS7, Line 100: BufferedTupleStream* batch_queue_;
Shouldn't this use unique_ptr ? May be I am missing something but as the code stands now, aren't we leaking 'batch_queue_' ?


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@53
PS7, Line 53: batch_queue_ = new
Use unique_ptr


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc@907
PS7, Line 907: query_options->max_pinned_result_spooling_memory 
nits: code seems more legible if we factor out these two options into local variables:

  int64_t max_pinned_mem = ...
  int64_t max_unpinned_mem = ...


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc@921
PS7, Line 921:   if (query_options->max_pinned_result_spooling_memory == 0
             :       && query_options->max_unpinned_result_spooling_memory != 0)
May be I mis-read it but isn't this the same check as the one on line 907 ?


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/service/query-options.cc@928
PS7, Line 928:   if (query_options->max_unpinned_result_spooling_memory != 0
             :       && query_options->max_unpinned_result_spooling_memory
             :           < query_options->max_pinned_result_spooling_memory)
Same question: isn't this the same as the one at line 912 ?


http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift@419
PS7, Line 419:   MAX_PINNED_RESULT_SPOOLING_MEMORY = 86
Also, please specify the behavior when setting this to 0. Same for any value < 0.



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 00:49:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#4).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
44 files changed, 1,000 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 3:

> Uploaded patch set 3.

Fixed a few tests failures.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Aug 2019 17:12:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 10: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc@80
PS8, Line 80:       RETURN_IF_ERROR(state_->StartSpilling(mem_tracker_));
> The Buffer Pool already seems to include some metrics to indicate that spil
That's a fair point. Most spilling operators have some single counter or ExecOption string with the word "Spilled" in it. I *think* this is useful for diagnosis to quickly see if something spilled without digging into BufferPool counters, but in some ways it would be better to have the BufferPOol counters that are consistent across operators. So yeah, I could go either way, ok to leave it out.

We also have some grepping in the stress test to identify queries that spilled based on specific strings appearing: https://github.com/apache/impala/blob/master/tests/stress/query_runner.py#L69


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java@78
PS8, Line 78:     Preconditions.checkState(memEstimateBytes_ >= 0, "Mem estimate must be set");
> I added that check to the ResourceProfile constructor, should I add it here
Oh I missed that. Including it in the constructor seems sufficient.



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 06:12:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#13).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 995 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/13
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 13
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Reviewed-on: http://gerrit.cloudera.org:8080/14039
Reviewed-by: Sahil Takiar <st...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 995 insertions(+), 268 deletions(-)

Approvals:
  Sahil Takiar: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 15
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4193/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 23:48:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4210/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Aug 2019 17:54:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4348/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 13
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 20:29:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4301/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 19:24:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#11).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 995 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 14: Code-Review+2

Carrying +2. Rebased on top of IMPALA-8691. Ran all modified tests locally and they pass. Once pre-commit checks pass, will kickoff a gerrit-verify-dryrun.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 19:55:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#6).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
44 files changed, 1,006 insertions(+), 135 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#5).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
44 files changed, 1,006 insertions(+), 135 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 9:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc@80
PS8, Line 80:       RETURN_IF_ERROR(state_->StartSpilling(mem_tracker_));
> Add something to the profile to indicate it spilled?
The Buffer Pool already seems to include some metrics to indicate that spilling occurred, do you think that is enough?

       PLAN_ROOT_SINK:(Total: 4s471ms, non-child: 4s471ms, % non-child: 100.00%)
         - PeakMemoryUsage: 40.00 KB (40960)
         - RowBatchGetWaitTime: 0.000ns
         - RowBatchSendWaitTime: 0.000ns
        Buffer pool:
           - AllocTime: 0.000ns
           - CumulativeAllocationBytes: 136.00 KB (139264)
           - CumulativeAllocations: 17 (17)
           - PeakReservation: 32.00 KB (32768)
           - PeakUnpinnedBytes: 128.00 KB (131072)
           - PeakUsedReservation: 32.00 KB (32768)
           - ReadIoBytes: 0
           - ReadIoOps: 0 (0)
           - ReadIoWaitTime: 0.000ns
           - ReservationLimit: 32.00 KB (32768)
           - SystemAllocTime: 0.000ns
           - WriteIoBytes: 128.00 KB (131072)
           - WriteIoOps: 16 (16)
           - WriteIoWaitTime: 0.000ns


http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/service/query-options.cc@907
PS8, Line 907:   int64_t max_mem = query_options->max_result_spooling_mem;
> I'd suggest just making this max_result_spooling_mem or max_result_spooling
Changed to 'max_result_spooling_mem' and 'max_spilled_result_spooling_mem'


http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/service/query-options.cc@911
PS8, Line 911: max_spilled_result_spooling_mem mus
> We should use "spilled" instead of "unpinned" in user-facing terminology to
Done


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@84
PS8, Line 84:         memEstimateBytes = (long) Math.ceil(inputCardinality * inputNode.getAvgRowSize());
> There will only ever be one instance of PLAN_ROOT_SINK, so we can remote th
Done


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87
PS8, Line 87: 
> We should cap this at the max result spooling memory, otherwise it could be
Moved the cap logic from the ResourceProfile constructor to this method.


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java@76
PS8, Line 76:     minMemReservationBytes_ = minMemReservationBytes;
> This isn't generally correct for all plan nodes, the memory estimate can in
Done


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java@78
PS8, Line 78:     Preconditions.checkState(memEstimateBytes_ >= 0, "Mem estimate must be set");
> Maybe check that maxMemReservationBytes_ >= minMemReservationBytes_
I added that check to the ResourceProfile constructor, should I add it here as well, or move it?



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Aug 2019 18:51:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4299/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 18:27:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#12).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 995 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/12
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 12
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4326/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 10
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Aug 2019 22:04:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaInternalService.thrift@385
PS13, Line 385:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaService.thrift@435
PS13, Line 435:   
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 13
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 19:50:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#3).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran core tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
44 files changed, 1,000 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4839/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 20:37:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.h@49
PS4, Line 49: Intializes 
> typo
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.cc@19
PS4, Line 19: #include "runtime/deque-row-batch-queue.h"
> Can be deleted ?
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/exec-node.cc@373
PS4, Line 373: node_type_name + " (id=" + std::to_string(id_) + ")"
> Use substitute for better legibility.
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.h@223
PS4, Line 223: const std::string
> Did you mean to pass-by-value so the compiler will do copy-elison ?
Changed to pass-by-reference to avoid an additional copy, but I think there is still a copy happening during the assignment:

 caller_label_ = caller_label

Not sure if there is a way to avoid that?


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.cc@212
PS4, Line 212: string caller_label
> Doesn't match the signature in header
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/deque-row-batch-queue.h
File be/src/runtime/deque-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/deque-row-batch-queue.h@33
PS4, Line 33: class DequeRowBatchQueue {
> Can this class and files be deleted after this patch ?
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/sorter.h@210
PS4, Line 210: //
> ///
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.h
File be/src/runtime/spillable-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.h@57
PS4, Line 57: std::string &name
> nit: std::string&
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.h@107
PS4, Line 107: std::string &n
> nit: std::string& name
Done


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.cc@101
PS4, Line 101: DCHECK((eos && IsEmpty()) || (!eos && !IsEmpty()));
> DCHECK_EQ(eos, IsEmpty());
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 18:54:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Aug 2019 00:38:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 20:52:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4194/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 23:56:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc@80
PS8, Line 80:       RETURN_IF_ERROR(state_->StartSpilling(mem_tracker_));
> That's a fair point. Most spilling operators have some single counter or Ex
Added: "profile_->AppendExecOption("Spilled");"

This seems to be what some of the other nodes do as well. It basically adds "ExecOption: Spilled" to the PLAN_ROOT_SINK section of the profile. Updated the Python test to assert the ExecOption string is in the profile.


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java@78
PS8, Line 78:     Preconditions.checkState(memEstimateBytes_ >= 0, "Mem estimate must be set");
> Oh I missed that. Including it in the constructor seems sufficient.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 20:03:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/blocking-plan-root-sink.cc
File be/src/exec/blocking-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/blocking-plan-root-sink.cc@39
PS1, Line 39: Status BlockingPlanRootSink::Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/buffered-plan-root-sink.cc@33
PS1, Line 33: Status BufferedPlanRootSink::Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/plan-root-sink.h
File be/src/exec/plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/exec/plan-root-sink.h@62
PS1, Line 62:   virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) override = 0;
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/runtime/spillable-row-batch-queue.cc@52
PS1, Line 52:   batch_queue_ = new BufferedTupleStream(state_, row_desc_, reservation_manager_.buffer_pool_client(),
> line too long (102 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14039/1/be/src/runtime/spillable-row-batch-queue.cc@128
PS1, Line 128:   if (batch_queue_ != nullptr) batch_queue_->Close(nullptr, RowBatch::FlushMode::NO_FLUSH_RESOURCES);
> line too long (101 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14039/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/14039/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@61
PS1, Line 61:    * of the associated fragment's root node. If the cardinality or the average row size are
> line too long (91 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 23:21:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4219/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Aug 2019 17:22:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 8:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4312/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 22:09:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4338/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 11
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Aug 2019 21:24:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4302/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 19:35:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4315/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Aug 2019 22:53:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#9).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 983 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 9
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#8).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 990 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#14).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran exhaustive tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
D be/src/runtime/deque-row-batch-queue.cc
D be/src/runtime/deque-row-batch-queue.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
46 files changed, 995 insertions(+), 268 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/14
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 8:

(8 comments)

I took a look at the resource management part of this

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc@80
PS8, Line 80:       RETURN_IF_ERROR(state_->StartSpilling(mem_tracker_));
Add something to the profile to indicate it spilled?


http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/runtime/spillable-row-batch-queue.cc@81
PS8, Line 81:       // The pin should be stream at this point.
            :       DCHECK(batch_queue_->is_pinned());
> It makes me wonder if the logic will be simpler if we just always use  an u
This wouldn't work with the current spilling infrastructure, which really assumes that you don't unpin things until you actually need to spill them. There's a few things:

* Spilling can be disabled and we need to call StartSpilling() to check whether it's ok to spill things
* Data is flushed to disk eagerly, even if there's plenty of memory
* Errors in writing to disk (e.g. hitting a scratch limit, no configured scratch, etc) will result in a query failure

You could add a mode where unpinned things don't get flushed until there is memory pressure (and errors from spilling being disabled are propagated in a different way), but it would be additional work and complexity in a different part of the code.


http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/service/query-options.cc@907
PS8, Line 907:   int64_t max_pinned_mem = query_options->max_pinned_result_spooling_memory;
I'd suggest just making this max_result_spooling_mem or max_result_spooling_memory. We haven't exposed the pinned/unpinned concept in user-facing options or documentation for the most part, and I think it would be confusing. For accounting purposes unpinned pages don't count against a query's memory limits.


http://gerrit.cloudera.org:8080/#/c/14039/8/be/src/service/query-options.cc@911
PS8, Line 911: max_unpinned_result_spooling_memory
We should use "spilled" instead of "unpinned" in user-facing terminology to be consistent with elsewhere.


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@84
PS8, Line 84:         long perInstanceInputCardinality =
There will only ever be one instance of PLAN_ROOT_SINK, so we can remote the numInstances calculations.


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87
PS8, Line 87:             (long) Math.ceil(perInstanceInputCardinality * inputNode.getAvgRowSize());
We should cap this at the max result spooling memory, otherwise it could be a huge overestimate.


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfile.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfile.java@76
PS8, Line 76:     memEstimateBytes_ = (maxMemReservationBytes > 0) ?
This isn't generally correct for all plan nodes, the memory estimate can include non-reserved memory. You need to do this calculation in PlanRootSink since it depends on the node how much non-reserved memory there might be.


http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
File fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java:

http://gerrit.cloudera.org:8080/#/c/14039/8/fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java@78
PS8, Line 78:     Preconditions.checkState(memEstimateBytes_ >= 0, "Mem estimate must be set");
Maybe check that maxMemReservationBytes_ >= minMemReservationBytes_



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 8
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Aug 2019 00:47:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14039

to look at the new patch set (#2).

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................

IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Replaces DequeRowBatchQueue with SpillableRowBatchQueue in
BufferedPlanRootSink. A few changes to BufferedPlanRootSink were
necessary for it to work with the spillable queue, however, all the
synchronization logic is the same.

SpillableRowBatchQueue is a wrapper around a BufferedTupleStream and
a ReservationManager. It takes in a TBackendResourceProfile that
specifies the max / min memory reservation the BufferedTupleStream can
use to buffer rows. The 'max_unpinned_bytes' parameter limits the max
number of bytes that can be unpinned in the BufferedTupleStream. The
limit is a 'soft' limit because calls to AddBatch may push the amount of
unpinned memory over the limit. The queue is non-blocking and not thread
safe. It provides AddBatch and GetBatch methods. Calls to AddBatch spill
if the BufferedTupleStream does not have enough reservation to fit the
entire RowBatch.

Adds two new query options: 'MAX_PINNED_RESULT_SPOOLING_MEMORY' and
'MAX_UNPINNED_RESULT_SPOOLING_MEMORY', which bound the amount of pinned
and unpinned memory that a query can use for spooling, respectively.
MAX_PINNED_RESULT_SPOOLING_MEMORY must be <=
MAX_UNPINNED_RESULT_SPOOLING_MEMORY in order to allow all the pinned
data in the BufferedTupleStream to be unpinned. This is enforced in a
new method in QueryOptions called 'ValidateQueryOptions'.

Planner Changes:

PlanRootSink.java now computes a full ResourceProfile if result spooling
is enabled. The min mem reservation is bounded by the size of the read and
write pages used by the BufferedTupleStream. The max mem reservation is
bounded by 'MAX_PINNED_RESULT_SPOOLING_MEMORY'. The mem estimate is
computed by estimating the size of the result set using stats.

BufferedTupleStream Re-Factoring:

For the most part, using a BufferedTupleStream outside an ExecNode works
properly. However, some changes were necessary:
* The message for the MAX_ROW_SIZE error is ExecNode specific. In order to
fix this, this patch introduces the concept of an ExecNode 'label' which
is a more generic version of an ExecNode 'id'.
* The definition of TBackendResourceProfile lived in PlanNodes.thrift,
it was moved to its own file so it can be used by DataSinks.thrift.
* Modified BufferedTupleStream so it internally tracks how many bytes
are unpinned (necessary for 'MAX_UNPINNED_RESULT_SPOOLING_MEMORY').

Metrics:
* Added a few of the metrics mentioned in IMPALA-8825 to
BufferedPlanRootSink. Specifically, added timers to track how much time
is spent waiting in the BufferedPlanRootSink 'Send' and 'GetNext'
methods.
* The BufferedTupleStream in the SpillableRowBatchQueue exposes several
BufferPool metrics such as number of reserved and unpinned bytes.

Bug Fixes:
* Fixed a bug in BufferedPlanRootSink where the MemPool used by the
expression evaluators was not being cleared incrementally.
* Fixed a bug where the inactive timer was not being properly updated in
BufferedPlanRootSink.
* Fixed a bug where RowBatch memory was not freed if
BufferedPlanRootSink::GetNext terminated early because it could not
handle requests where num_results < BATCH_SIZE.

Testing:
* Added new tests to test_result_spooling.py.
* Updated errors thrown in spilling-large-rows.test.
* Ran core tests.

Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
---
M be/generated-sources/gen-cpp/CMakeLists.txt
M be/src/exec/analytic-eval-node.cc
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/blocking-plan-root-sink.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.h
M be/src/exec/data-sink.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/grouping-aggregator-partition.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/plan-root-sink.h
M be/src/exec/sort-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
A be/src/runtime/spillable-row-batch-queue.cc
A be/src/runtime/spillable-row-batch-queue.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CMakeLists.txt
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
A common/thrift/ResourceProfile.thrift
M common/thrift/generate_error_codes.py
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/ResourceProfileBuilder.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
M tests/query_test/test_result_spooling.py
44 files changed, 999 insertions(+), 129 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/14039/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaInternalService.thrift@385
PS13, Line 385:   
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/13/common/thrift/ImpalaService.thrift@435
PS13, Line 435:   
> line has trailing whitespace
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 13
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Aug 2019 19:53:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 7:

(9 comments)

Nice! Had some relatively minor comments.

I think a few of them come from a philosophical thing about error handling - I think I'd prefer if SpillableRowBatchQueue's invariants were stricter about when it was valid to call different methods in the lifecycle - currently there are a few runtime checks that would only be triggered if there was a bug in the calling code.

There's a couple of reasons I prefer it this way (and why we've generally done it this way in Impala). First, if the error-handling code isn't actually reachable, then we can't really test that it works correctly. Second, too many defensive checks add complexity and runtime overhead if the pattern is repeated - and it does actually get confusing whether something is an invariant of the system or a runtime error.

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/exec/exec-node.h@196
PS7, Line 196:   const std::string label() const;
I think the const is unnecessary for a return by value (if you're returning by value, the caller gets its own copy anyway, so can choose whether it wants to mutate it).


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/buffered-tuple-stream.h@370
PS7, Line 370:   int64_t BytesUnpinned() const {
I'd probably just call it bytes_unpinned() cause it's a plain accessor (there's a bit of a grey area in naming conventions here, but plain accessors usually have are lower case with underscores).


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h
File be/src/runtime/spillable-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h@71
PS7, Line 71:   /// successfully added, returns an error Status if the queue is full, has already been
Why not just make it invalid to append to the queue when it's full? We could then make it a DCHECK instead of a runtime check


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.h@79
PS7, Line 79:   /// Returns and removes the RowBatch at the head of the queue. Returns Status::OK() if
Same thing - if it's not valid to call GetBatch() on an empty queue, just make it part of the contract and use a DCHECK instead of the runtime check.


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@67
PS7, Line 67:   if (UNLIKELY(closed_)) return Status("SpillableRowBatchQueue is already closed.");
If it's a bug to call this when it's closed, let's just make it a DCHECK.


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@112
PS7, Line 112:   if (UNLIKELY(closed_)) return false;
I think it would be a bug to call this when it's closed, so we could document it as a precondition and make it a DCHECK


http://gerrit.cloudera.org:8080/#/c/14039/7/be/src/runtime/spillable-row-batch-queue.cc@130
PS7, Line 130:   if (batch_queue_ != nullptr)
Missing braces for multi-line if


http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/14039/7/common/thrift/ImpalaService.thrift@415
PS7, Line 415:   // The maximum amount of pinned memory used when spooling query results. If this value
It's probably better to avoid renumbering things in thrift files. I think it's ok in this case since it's not really exposed in public APIs, but it's probably a good habit to avoid doing it.


http://gerrit.cloudera.org:8080/#/c/14039/7/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java:

http://gerrit.cloudera.org:8080/#/c/14039/7/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@88
PS7, Line 88:       maxMemReservationBytes = Math.max(maxMemReservationBytes, minMemReservationBytes);
Move this up to l74 so that the calculation is all in one place?



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Aug 2019 21:26:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8818: Replace deque with spillable queue in BufferedPRS

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/14039 )

Change subject: IMPALA-8818: Replace deque with spillable queue in BufferedPRS
......................................................................


Patch Set 4:

(10 comments)

LGTM. Mostly minor comments for the first pass. Will do a second pass again today.

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.h
File be/src/exec/buffered-plan-root-sink.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.h@49
PS4, Line 49: Intializes 
typo


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/buffered-plan-root-sink.cc@19
PS4, Line 19: #include "runtime/deque-row-batch-queue.h"
Can be deleted ?


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/exec-node.cc
File be/src/exec/exec-node.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/exec/exec-node.cc@373
PS4, Line 373: node_type_name + " (id=" + std::to_string(id_) + ")"
Use substitute for better legibility.


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.h
File be/src/runtime/buffered-tuple-stream.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.h@223
PS4, Line 223: const std::string
Did you mean to pass-by-value so the compiler will do copy-elison ?


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/buffered-tuple-stream.cc@212
PS4, Line 212: string caller_label
Doesn't match the signature in header


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/deque-row-batch-queue.h
File be/src/runtime/deque-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/deque-row-batch-queue.h@33
PS4, Line 33: class DequeRowBatchQueue {
Can this class and files be deleted after this patch ?


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/sorter.h@210
PS4, Line 210: //
///


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.h
File be/src/runtime/spillable-row-batch-queue.h:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.h@57
PS4, Line 57: std::string &name
nit: std::string&


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.h@107
PS4, Line 107: std::string &n
nit: std::string& name


http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.cc
File be/src/runtime/spillable-row-batch-queue.cc:

http://gerrit.cloudera.org:8080/#/c/14039/4/be/src/runtime/spillable-row-batch-queue.cc@101
PS4, Line 101: DCHECK((eos && IsEmpty()) || (!eos && !IsEmpty()));
DCHECK_EQ(eos, IsEmpty());



-- 
To view, visit http://gerrit.cloudera.org:8080/14039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10f9e72374cdf9501c0e5e2c5b39c13688ae65a9
Gerrit-Change-Number: 14039
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Aug 2019 18:47:25 +0000
Gerrit-HasComments: Yes