You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Omid Shahidi (Code Review)" <ge...@cloudera.org> on 2022/07/29 22:08:42 UTC

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Omid Shahidi has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18798


Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for  an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 257 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 1
Gerrit-Owner: Omid Shahidi <om...@gmail.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
Passed core tests and ran a single node benchmark which shows no
regression. Ported row-batch-serialize-test and
row-batch-serialize-benchmark to test the new BE using KRPC. Collected
query-profile, heap growth, and memory usage log showing untracked
memory decreased by 1/2.

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
6 files changed, 501 insertions(+), 168 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11091/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 00:44:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 17: Code-Review+1

(2 comments)

looks good, only a few minor comments

http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h@493
PS17, Line 493:   /// This implementation is added to separate the logical path for protobuf and Thrift
              :   /// serialization
              :   ///
              :   /// 'full_dedup': true if full deduplication is used.
              :   /// 'tuple_offsets': Updated to contain offsets of all tuples into 'tuple_data' upon
              :   ///                  return. There are a total of num_rows * num_tuples_per_row offsets.
              :   ///                  An offset of -1 records a NULL.
              :   /// 'tuple_data': Updated to hold the serialized tuples' data. If 'is_compressed'
              :   ///               is true, this is LZ4 compressed.
              :   /// 'uncompressed_size': Updated with the uncompressed size of 'tuple_data'.
              :   /// 'is_compressed': true if compression is applied on 'tuple_data'.
              :   ///
              :   /// Returns error status if serialization failed. Returns OK otherwise.
              :   ///
              :   /// TODO: clean this up once the thrift RPC implementation is removed.
              :   Status SerializeThrift(bool full_dedup, vector<int32_t>* tuple_offsets,
              :       string* tuple_data, int64_t* uncompressed_size, bool* is_compressed);
remove


http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h@511
PS17, Line 511: Shared implementation between thrift and protobuf to deserialize a row batch.
remove



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 17
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Oct 2022 20:17:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#18) to the change originally created by Omid Shahidi. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of a row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we can allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution replaces the type for tuple data
and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.

This patch adds memory estimation in DataStreamSink.java to account
for OutboundRowBatch memory allocation. This patch also removes the
thrift-based serialization because the thrift RPC has been removed
in the prior commit.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark
   to test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in the runtime
   profile
 - Print test line number in PlannerTestBase.java

New row-batch serialization benchmark:

Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize:            10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
   ser_no_dups_base  19.4  19.5  19.6      1X      1X      1X
        ser_no_dups  24.7  24.8    25   1.27X   1.27X   1.27X
   ser_no_dups_full  18.2  18.4  18.5  0.936X  0.943X  0.943X

  ser_adj_dups_base  28.8  29.1  29.1      1X      1X      1X
       ser_adj_dups  98.1  98.8  99.6   3.41X   3.39X   3.42X
  ser_adj_dups_full  74.9  75.3  75.9    2.6X   2.59X   2.61X

      ser_dups_base  20.9  21.1  21.3      1X      1X      1X
           ser_dups  26.4  26.5  26.8   1.27X   1.26X   1.26X
      ser_dups_full  48.4  48.9  49.2   2.32X   2.32X   2.31X

deserialize:          10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
 deser_no_dups_base  58.1  58.5  58.9      1X      1X      1X
      deser_no_dups  58.8  59.4  59.9   1.01X   1.02X   1.02X

deser_adj_dups_base   128   129   131      1X      1X      1X
     deser_adj_dups   203   206   207   1.58X   1.59X   1.59X

    deser_dups_base   145   147   148      1X      1X      1X
         deser_dups   258   260   263   1.77X   1.77X   1.77X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M common/thrift/Results.thrift
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
123 files changed, 2,921 insertions(+), 2,807 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/18
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 18: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8698/


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Oct 2022 07:34:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8468/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 22:32:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 19:

Can't figure out what the bug is. So I revert back to separate method just like in patch set 17.

Got a chance to run exhaustive test last night. One test case in test_spilling_large_rows failed due to mem_limit hit. I raised it to 1.4GB to account for KRPC memory.
Also removed redundant uncompressed_size variable.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 04:38:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11192/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 20:30:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

serialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
ser_no_dups_baseline    8.36 8.6 8.7   1X  1X  1X
ser_no_dups             6.73 6.85 6.93 0.804X 0.796X 0.796X
ser_no_dups_full        5.28 5.38 5.55 0.631X 0.625X 0.637X

ser_adjacent_dups_baseline 12.9 13.2 13.4 1X 1X 1X
ser_adjacent_dups          23.2 23.7 24.1 1.8X 1.8X 1.8X
ser_adjacent_dups_full     19.9 20.3 20.7 1.54X 1.54X 1.55X

ser_dups_baseline          9.17 9.54 9.72 1X  1X 1X
ser_dups                7.45 7.69 7.86 0.812X 0.806X 0.809X
ser_dups_full           14.6 15 15.3 1.6X 1.57X 1.57X

deserialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
deser_no_dups_baseline  32.6 33.5 34   1X   1X    1X
deser_no_dups           32.5 33.1 33.7 0.999X 0.99X 0.992X

deser_adjacent_dups_baseline  53.1 54 54.7 1X 1X 1X
deser_adjacent_dups     80.3 81.6  82.5 1.51X 1.51X 1.51X

deser_dups_baseline      52.4 54  54.7  1X  1X   1X
deser_dups               86.8 88.4 89.7 1.66X 1.64X 1.64X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
A testdata/workloads/functional-query/queries/datastream-sender.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
10 files changed, 657 insertions(+), 173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11170/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Aug 2022 04:28:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11103/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 01:58:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11107/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 20:20:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 6:

(8 comments)

add end-end unit test

http://gerrit.cloudera.org:8080/#/c/18798/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/6//COMMIT_MSG@22
PS6, Line 22: Passed core tests and ran a single node benchmark which shows no
            : regression. Ported row-batch-serialize-test and
            : row-batch-serialize-benchmark to test the new BE using KRPC. Collected
            : query-profile, heap growth, and memory usage log showing untracked
            : memory decreased by 1/2.
separate each tasks as:
Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression. 
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log showing untracked memory decreased by 1/2.


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc@437
PS6, Line 437: argc, argv, true
add impala::TestInfo::BE_TEST


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@105
PS6, Line 105: mem_allocator_->Free(reinterpret_cast<uint8_t*>(tuple_data_))
Should we check if tuple_data_ is not nullptr before calling Free()?


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@114
PS6, Line 114: inline Status AllocateTraceableBuffer
Add a new header file row-ratch.inline.h and put this inline function body into the new header file.


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@115
PS6, Line 115:  char** buffer_ptr, int64_t* buffer_length, int64_t* buffer_capacity
Define two allocation functions to replace AllocateTraceableBuffer(), one for tuple_data, another for compression_scratch so that we don't need to pass parameters except 'size'.


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@306
PS6, Line 306: // bool full_dedup = UseFullDedup();
delete


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@322
PS6, Line 322: &output_batch->tuple_data_,
             :       &output_batch->tuple_data_length_, &output_batch->tuple_data_capacity_
change function signature so that we don't need to pass member variables as parameters.


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@393
PS6, Line 393: Thrift implementation for Serialization using TRowBatch.
             : /// Benchmarks (be/src/benchmarks/row-batch-serialize-benchmark.cc) and tests
             : /// (be/src/runtime/row-batch-serialize-test.cc) for serialization use TRowBatch and
             : /// Thrift so we need to keep the old implementation so they don't fail.
You already replace TRowBatch in those two files, update the comments accordingly. We can clean the Thrift related code later.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Aug 2022 04:03:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11076/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Aug 2022 19:56:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

> Patch Set 13:
> 
> (2 comments)

I will investigate the mem_limit test failure this weekend before I have to return the work machine. Currently trying to set up a docker environment.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 23:16:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc@138
PS6, Line 138:       uint8_t* input = const_cast<uint8_t*>(
const_cast shouldn't be needed here


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc@140
PS6, Line 140:       uint8_t* compressed_output = const_cast<uint8_t*>(
const_cast shouldn't be needed here


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/krpc-data-stream-sender.h@30
PS6, Line 30: #include "exec/data-sink.h"
Move this back where it was before


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@71
PS6, Line 71:   std::unique_ptr<FreePool> free_pool_;
Is unique_lock actually required here? Seem the locking here is fairly simple and using unique_lock may be wasteful.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Aug 2022 19:33:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 15:

Added KRPC memory estimation, DataStreamSink.estimateOutboundRowBatchBuffers(), at patch set 15.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 15
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Sep 2022 04:23:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@713
PS2, Line 713:   for (auto batch : outbound_batches_) {
> Done for auto&
Deepends if we actually need to copy OutBoundRowBatch anywhere. Otherwise leaving the copy constructor can lead to these kinds of efficiency issues or worst if the copy constructor is not implemented correctly.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@124
PS2, Line 124:   /// TODO: this can probably be a struct
> in your opinion, which one would be nicer? having a struct or leaving it as
I don't think the struct will simplify the code since these are not passed as arguments anywhere.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc@340
PS2, Line 340: Status RowBatch::Serialize(bool full_dedup, DedupMap* distinct_tuples,
> Add comment explaining why SerializeThrift was added.
Please reply on this. The code here appears to have some duplication so we should have an explanation why there is two versions.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc@471
PS2, Line 471:     vector<int32_t>* tuple_offsets, char* tuple_data) {
> Maybe less casting if you use uint8_t* here
Did you try changing these and it made things worse?



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 17:33:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#20) to the change originally created by Omid Shahidi. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of a row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we can allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution replaces the type for tuple data
and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.

This patch adds memory estimation in DataStreamSink.java to account
for OutboundRowBatch memory allocation. This patch also removes the
thrift-based serialization because the thrift RPC has been removed
in the prior commit.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark
   to test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add test_datastream_sender.py to verify the peak memory of EXCHANGE
   SENDER node.
 - Raise mem_limit in two of test_spilling_large_rows test case.
 - Print test line number in PlannerTestBase.java

New row-batch serialization benchmark:

Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize:            10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
   ser_no_dups_base  18.6  18.8  18.9      1X      1X      1X
        ser_no_dups  18.5  18.5  18.8  0.998X  0.988X  0.991X
   ser_no_dups_full  14.7  14.8  14.8  0.793X   0.79X  0.783X

  ser_adj_dups_base  28.2  28.4  28.8      1X      1X      1X
       ser_adj_dups  68.9  69.1  69.8   2.44X   2.43X   2.43X
  ser_adj_dups_full  56.2  56.7  57.1   1.99X      2X   1.99X

      ser_dups_base  20.7  20.9  20.9      1X      1X      1X
           ser_dups  20.6  20.8  20.9  0.994X  0.995X      1X
      ser_dups_full  39.8    40  40.5   1.93X   1.92X   1.94X

deserialize:          10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
 deser_no_dups_base  75.9  76.6    77      1X      1X      1X
      deser_no_dups  74.9  75.6    76  0.987X  0.987X  0.987X

deser_adj_dups_base   127   128   129      1X      1X      1X
     deser_adj_dups   179   193   195   1.41X   1.51X   1.51X

    deser_dups_base   128   128   129      1X      1X      1X
         deser_dups   165   190   193   1.29X   1.48X   1.49X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M common/thrift/Results.thrift
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
124 files changed, 2,870 insertions(+), 2,743 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/20
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 20
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11644/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 20
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Oct 2022 16:11:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 20:

Patch set 20 increase mem_limit for tests that failed in dockerised environment.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 20
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Oct 2022 15:52:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 21: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 21
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Oct 2022 21:59:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
Passed core tests and ran a single node benchmark which shows no
regression

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 296 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/krpc-data-stream-sender.h@271
PS1, Line 271:   /// Memory tracker from Parent Memory Tracker for tracking memory of OutBoundRowBatch serialization
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/krpc-data-stream-sender.cc@782
PS1, Line 782:   outbound_rb_mem_tracker_.reset(new MemTracker(-1, "Memory tracker for OutBoundRowBatch serialization", parent_mem_tracker, false));
line too long (133 > 90)


http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/krpc-data-stream-sender.cc@1108
PS1, Line 1108:     RETURN_IF_ERROR(src->Serialize(dest, outbound_rb_free_pool_.get(), krpc_tuple_data_bytes_, krpc_compression_scratch_bytes_));
line too long (129 > 90)


http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/row-batch.h@108
PS1, Line 108:     pool->Free(const_cast<uint8_t*>(reinterpret_cast<const uint8_t*>(compression_scratch_)));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/row-batch.cc@414
PS1, Line 414:     RETURN_IF_ERROR(SerializeInternal(size, &distinct_tuples, tuple_offsets, tuple_data_ptr));
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/18798/1/be/src/runtime/row-batch.cc@519
PS1, Line 519:   
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 1
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 22:09:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

We can probably follow memory estimate computation from example in ExchangeNode.java
https://github.com/apache/impala/blob/d74d6994cc25ed2090886d6b406cf477a6ccf6b4/fe/src/main/java/org/apache/impala/planner/ExchangeNode.java#L208-L221

I can help looking into this.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Sep 2022 08:09:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#19) to the change originally created by Omid Shahidi. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of a row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we can allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution replaces the type for tuple data
and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.

This patch adds memory estimation in DataStreamSink.java to account
for OutboundRowBatch memory allocation. This patch also removes the
thrift-based serialization because the thrift RPC has been removed
in the prior commit.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark
   to test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in the runtime
   profile
 - Raise mem_limit in one of test_spilling_large_rows test case.
 - Print test line number in PlannerTestBase.java

New row-batch serialization benchmark:

Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize:            10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
   ser_no_dups_base  18.6  18.8  18.9      1X      1X      1X
        ser_no_dups  18.5  18.5  18.8  0.998X  0.988X  0.991X
   ser_no_dups_full  14.7  14.8  14.8  0.793X   0.79X  0.783X

  ser_adj_dups_base  28.2  28.4  28.8      1X      1X      1X
       ser_adj_dups  68.9  69.1  69.8   2.44X   2.43X   2.43X
  ser_adj_dups_full  56.2  56.7  57.1   1.99X      2X   1.99X

      ser_dups_base  20.7  20.9  20.9      1X      1X      1X
           ser_dups  20.6  20.8  20.9  0.994X  0.995X      1X
      ser_dups_full  39.8    40  40.5   1.93X   1.92X   1.94X

deserialize:          10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
 deser_no_dups_base  75.9  76.6    77      1X      1X      1X
      deser_no_dups  74.9  75.6    76  0.987X  0.987X  0.987X

deser_adj_dups_base   127   128   129      1X      1X      1X
     deser_adj_dups   179   193   195   1.41X   1.51X   1.51X

    deser_dups_base   128   128   129      1X      1X      1X
         deser_dups   165   190   193   1.29X   1.48X   1.49X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M common/thrift/Results.thrift
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
124 files changed, 2,868 insertions(+), 2,743 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/19
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8703/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 05:49:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11621/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Oct 2022 05:55:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 16:

(19 comments)

Patch set 16 switch the tracking implementation from using MemPool to using custom MemTrackerAllocator.

The MemTrackerAllocator is a little bit more efficient, as shown by relative measurement from row-batch-serialize-benchmark.
I tested both method using single_node_perf_run.py against TPCH-30 data and both show similar performance without major regression.

http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG@18
PS15, Line 18: 
             : This patch also remove the thrift based s
> Should we remove those code? They are not used anymore.
Done


http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG@20
PS15, Line 20: thrift RPC has been removed in prior commit.
> Also mention the code change in Planner.
Will mention the planner change in the next patch set.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.h@262
PS15, Line 262: .
> nit: bytes
This counter is removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.h@265
PS15, Line 265: ackin
> nit: bytes
This counter is removed.


http://gerrit.cloudera.org:8080/#/c/18798/12/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/12/be/src/runtime/krpc-data-stream-sender.cc@1101
PS12, Line 1101:     outbound_rb_mem_tracker_->Close();
               :   }
> my hypothesis is that since we set the tracker to nullptr here, when all tr
Patch set 15 fixed this.


http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc@1095
PS13, Line 1095: for (int i = 0; i < channels_.size(); ++i) 
> ok
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc@1103
PS13, Line 1103: 
> that will be safe
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc@780
PS15, Line 780:   eos_sent_counter_ = ADD_COUNTER(profile(), "EosSent", TUnit::UNIT);
              :   uncompressed_bytes_counter_ =
              :       ADD_COUNTER(profile(), "UncompressedRowBatchSize", TUnit::BYTES);
              :   total_sent_rows_counter_ = ADD_COUNTER(profile(), "RowsSent", TUnit::UNIT);
> I was suggesting this counter addition for research purpose. Wonder if we s
This counter is removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc@786
PS15, Line 786: new MemTracker(-1, "RowBatchSerialization", mem_tracker_.get()));
> We should nest MemTracker under the KrpcDataStreamSender's mem_tracker_.get
Done


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc@1116
PS15, Line 1116:   COUNTER_ADD(uncompressed_bytes_counter_, unc
> Can be set during Prepare and Init method rather than here.
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@70
PS15, Line 70: kudu::Slice TupleDataAsSl
> not necessary to define the pointer as unique_ptr, regular pointer should b
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@101
PS15, Line 101: 
> nit: serialization
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@137
PS15, Line 137: c:
> If tuple_data_length_ > tuple_data_capacity_, we need to reallocate data bu
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@142
PS15, Line 142: 
> serialization.
Done


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@143
PS15, Line 143: _FLUSH_RESOURCES,
> The compression_scratch_, its length and capacity will be swapped
Removed.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.cc@321
PS15, Line 321: }
> In the original code, we throw TErrorCode::ROW_BATCH_TOO_LARGE if size is l
Done


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.cc@365
PS15, Line 365:   tuple->DeepCopy(**desc, &tuple_data, &offset, /* convert_ptrs */ true);
> In the original code, we do not resize here if current length is longer tha
Done


http://gerrit.cloudera.org:8080/#/c/18798/15/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java:

http://gerrit.cloudera.org:8080/#/c/18798/15/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@64
PS15, Line 64: th
> nit: are
Done


http://gerrit.cloudera.org:8080/#/c/18798/15/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@75
PS15, Line 75:     queryOptions.batch_size :
             :         PlanNode.DEFAULT_ROWBATCH_S
> Could you add a comment why set these two variables as 2?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 16
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Oct 2022 04:08:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#15) to the change originally created by Omid Shahidi. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

New row-batch serialization benchmark:

serialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
ser_no_dups_baseline    8.36 8.6 8.7   1X  1X  1X
ser_no_dups             6.73 6.85 6.93 0.804X 0.796X 0.796X
ser_no_dups_full        5.28 5.38 5.55 0.631X 0.625X 0.637X

ser_adjacent_dups_baseline 12.9 13.2 13.4 1X 1X 1X
ser_adjacent_dups          23.2 23.7 24.1 1.8X 1.8X 1.8X
ser_adjacent_dups_full     19.9 20.3 20.7 1.54X 1.54X 1.55X

ser_dups_baseline          9.17 9.54 9.72 1X  1X 1X
ser_dups                7.45 7.69 7.86 0.812X 0.806X 0.809X
ser_dups_full           14.6 15 15.3 1.6X 1.57X 1.57X

deserialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
deser_no_dups_baseline  32.6 33.5 34   1X   1X    1X
deser_no_dups           32.5 33.1 33.7 0.999X 0.99X 0.992X

deser_adjacent_dups_baseline  53.1 54 54.7 1X 1X 1X
deser_adjacent_dups     80.3 81.6  82.5 1.51X 1.51X 1.51X

deser_dups_baseline      52.4 54  54.7  1X  1X   1X
deser_dups               86.8 88.4 89.7 1.66X 1.64X 1.64X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
121 files changed, 3,132 insertions(+), 2,670 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/15
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 15
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 15:

(7 comments)

Should we clean up Thrift RPC related code, which use TRowBatch?

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.h@262
PS15, Line 262: B
nit: bytes


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.h@265
PS15, Line 265: Bytes
nit: bytes


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@70
PS15, Line 70: std::unique_ptr<FreePool>
not necessary to define the pointer as unique_ptr, regular pointer should be fine.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@101
PS15, Line 101: Serialize()
nit: serialization


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@137
PS15, Line 137: if tuple_data_length_ > tuple_data_capacity_ new allocation will be necessary
If tuple_data_length_ > tuple_data_capacity_, we need to reallocate data buffer.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@142
PS15, Line 142: Serialize()
serialization.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.h@143
PS15, Line 143: The compression_scratch_ and its length and capacity will be std::swap
The compression_scratch_, its length and capacity will be swapped



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 15
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Oct 2022 17:09:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc@400
PS4, Line 400: Benchmarks (be/src/benchmarks/row-batch-serialize-benchmark.cc) and tests
             : /// (be/src/runtime/row-batch-serialize-test.cc) for serialization use TRowBatch and
             : /// Thrift so we need to keep the old implementation so they don't fail.
Change benchmark and unit-test code for your new RowBatch::Serialize()



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 17:56:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 17:

(1 comment)

Also remove TRowBatch from common/thrift/Results.thrift

http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h@46
PS17, Line 46: class TRowBatch;
remove



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 17
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Oct 2022 21:30:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8698/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Oct 2022 06:13:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 20: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 20
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Oct 2022 16:54:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 21: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 21
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Oct 2022 16:54:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11061/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 1
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Jul 2022 22:29:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/11150/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Aug 2022 21:03:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

New row-batch serialization benchmark:

serialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
ser_no_dups_baseline    8.36 8.6 8.7   1X  1X  1X
ser_no_dups             6.73 6.85 6.93 0.804X 0.796X 0.796X
ser_no_dups_full        5.28 5.38 5.55 0.631X 0.625X 0.637X

ser_adjacent_dups_baseline 12.9 13.2 13.4 1X 1X 1X
ser_adjacent_dups          23.2 23.7 24.1 1.8X 1.8X 1.8X
ser_adjacent_dups_full     19.9 20.3 20.7 1.54X 1.54X 1.55X

ser_dups_baseline          9.17 9.54 9.72 1X  1X 1X
ser_dups                7.45 7.69 7.86 0.812X 0.806X 0.809X
ser_dups_full           14.6 15 15.3 1.6X 1.57X 1.57X

deserialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
deser_no_dups_baseline  32.6 33.5 34   1X   1X    1X
deser_no_dups           32.5 33.1 33.7 0.999X 0.99X 0.992X

deser_adjacent_dups_baseline  53.1 54 54.7 1X 1X 1X
deser_adjacent_dups     80.3 81.6  82.5 1.51X 1.51X 1.51X

deser_dups_baseline      52.4 54  54.7  1X  1X   1X
deser_dups               86.8 88.4 89.7 1.66X 1.64X 1.64X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
9 files changed, 656 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc@1095
PS13, Line 1095: if (outbound_rb_mem_pool_.get() != nullptr)
> possibly change to if(UNLIKELY(outbound_rb_mem_pool.get() != nullptr) 
ok


http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc@1103
PS13, Line 1103: delete outbound_rb_free_pool_;
> check if outbound_rb_free_pool_ != nullptr
that will be safe



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Aug 2022 02:39:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

There are 4 end-end unit-test failures due to memory limit exceeded for large rows. This is expected behavior since we allocate more memory from trackable buffer pool, especially affect test with large rows.
To fix these failures, increase mem_limit for these test cases.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Aug 2022 02:38:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8476/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 20:49:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@713
PS2, Line 713:   for (auto& batch : outbound_batches_) {
> Use auto& to avoid copying OutboundRowBatch. Should probably delete the cop
Done for auto&

Not sure why copy ctor should be deleted? Are we talking about the copy ctor for OutBoundRowBatch or copy ctor on line 718? Can you elaborate? thanks!


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@1092
PS2, Line 1092:   if (outbound_rb_mem_pool_ != nullptr) {
> Missing {}
Done


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@1093
PS2, Line 1093:     outbound_rb_mem_pool_->FreeAll();
> Missing {}
Done


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@1105
PS2, Line 1105: Status KrpcDataStreamSender::SerializeBatch(
> Probably better to not add these inside the serialize timer scope
Done


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@124
PS2, Line 124:   /// if tuple_data_length_ > tuple_data_capacity_ new allocation will be necessary
> Either add the struct or remove the comment.
in your opinion, which one would be nicer? having a struct or leaving it as is?


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@584
PS2, Line 584: 
> Maybe less casting if you use uint8_t* here
Majority of casting happens after and before serialization when need to change it because of allocation, do you mean to completely use uint8_t* instead of char* for tuple_data or just the parameter of this function? Not sure how it can reduce casting.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 17:19:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.cc@417
PS5, Line 417: if (size > numeric_limits<int32_t>::max()) {
             :       return Status(
             :           TErrorCode::ROW_BATCH_TOO_LARGE, size, numeric_limits<int32_t>::max());
             :     }
You only check size for full_dedup as true. But size could exceed maximum int32 for full_dedup as false. This happens when run row-batch-serialize-test.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Aug 2022 21:25:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#16) to the change originally created by Omid Shahidi. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we are able to allocate tuple
data and compression scratch and account for it in the memory tracker
of the KrpcDataStreamSender. This solution replace the type for tuple
data and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.

This patch also remove the thrift based serialization because the
thrift RPC has been removed in prior commit.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile
 - Print test line number in PlannerTestBase.java

New row-batch serialization benchmark:

Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize:            10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
   ser_no_dups_base  18.6  18.8  18.9      1X      1X      1X
        ser_no_dups  18.5  18.5  18.8  0.998X  0.988X  0.991X
   ser_no_dups_full  14.7  14.8  14.8  0.793X   0.79X  0.783X

  ser_adj_dups_base  28.2  28.4  28.8      1X      1X      1X
       ser_adj_dups  68.9  69.1  69.8   2.44X   2.43X   2.43X
  ser_adj_dups_full  56.2  56.7  57.1   1.99X      2X   1.99X

      ser_dups_base  20.7  20.9  20.9      1X      1X      1X
           ser_dups  20.6  20.8  20.9  0.994X  0.995X      1X
      ser_dups_full  39.8    40  40.5   1.93X   1.92X   1.94X

deserialize:          10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
 deser_no_dups_base  75.9  76.6    77      1X      1X      1X
      deser_no_dups  74.9  75.6    76  0.987X  0.987X  0.987X

deser_adj_dups_base   127   128   129      1X      1X      1X
     deser_adj_dups   179   193   195   1.41X   1.51X   1.51X

    deser_dups_base   128   128   129      1X      1X      1X
         deser_dups   165   190   193   1.29X   1.48X   1.49X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
122 files changed, 2,875 insertions(+), 2,703 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/16
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 16
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 18:

> Patch Set 18: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8698/

Simply merging the two method seems to introduce a bug. I'll tinker a bit more on this.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Oct 2022 11:06:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG@20
PS15, Line 20: for OutboundRowBatch memory allocation. This patch also removes the
> Will mention the planner change in the next patch set.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 17
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Oct 2022 04:23:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8703/


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 11:00:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11187/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 21:52:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/10//COMMIT_MSG@29
PS10, Line 29: 
Add a title here



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 22:01:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8476/


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 20 Aug 2022 01:31:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/benchmarks/row-batch-serialize-benchmark.cc@38
PS7, Line 38: 
            : // Benchmark to measure how quickly we can serialize and deserialize row batches. More
            : // specifically, this benchmark was developed to measure the overhead of deduplication.
            : // The benchmarks are divided into serialization and deserialization benchmarks.
            : // The serialization benchmarks test different serialization methods (the new default of
            : // adjacent deduplication vs. the baseline of no deduplication) on row batches with
            : // different patterns of duplication: no_dups and adjacent_dups.
            : // For all benchmarks we use (int, string) tuples to exercise both variable-length and
            : // fixed-length slot handling. The small tuples with few slots emphasizes per-tuple
            : // dedup performance rather than per-slot serialization/deserialization performance.
            : //
            : // serialize:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //          ser_no_dups_baseline               17.43                  1X
            : //                   ser_no_dups               17.33             0.9944X
            : //              ser_no_dups_full                14.1             0.8092X
            : //
            : //    ser_adjacent_dups_baseline               26.65                  1X
            : //             ser_adjacent_dups               63.98                2.4X
            : //        ser_adjacent_dups_full               55.88              2.096X
            : //
            : //             ser_dups_baseline               19.26                  1X
            : //                      ser_dups               19.55              1.015X
            : //                 ser_dups_full                32.4              1.682X
            : //
            : // deserialize:          Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //        deser_no_dups_baseline               64.94                  1X
            : //                 deser_no_dups               69.24              1.066X
            : //
            : //  deser_adjacent_dups_baseline                 112                  1X
            : //           deser_adjacent_dups               207.4              1.852X
            : //
            : //           deser_dups_baseline               114.8                  1X
            : //                    deser_dups               208.5              1.817X
            : //
            : // Earlier results with LossyHashTable
            : // serialize:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //             ser_no_dups_lossy               15.93             0.9139X
            : //       ser_adjacent_dups_lossy               58.21              2.184X
            : //                ser_dups_lossy               50.46               2.62X
            : //
            : // Earlier results with boost::unordered_map
            : // serialize:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //              ser_no_dups_full                8.73             0.5582X
            : //
            : //        ser_adjacent_dups_full                38.7              1.634X
            : //
            : //                 ser_dups_full                27.5               1.54X
> Should this be removed and updated with the current benchmark scores?
Yes, please update. The benchmark scores should not have significant change.


http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test
File testdata/workloads/tpch/queries/datastream-sender.test:

http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test@1
PS7, Line 1: ====
> Created a query for functional-query workload and a query for tpch workload
remove functional-query workload since it's too simple and it's already covered by tpch workload.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 20:33:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc@30
PS10, Line 30: kudu/kudu-util.h"
> Why make this change? There is no exec/kudu directory.
I did not update my local tree. May be other commits made change and move the kudu-til.h to new directory.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 21:47:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 6:

> Patch Set 6:
> 
> (12 comments)
> 
> > Patch Set 6:
> > 
> > (8 comments)
> > 
> > add end-end unit test
> 
> added end-to-end test, but it currently fails. Will investigate it and bring it to code review again.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Aug 2022 03:54:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

New row-batch serialization benchmark:

serialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
ser_no_dups_baseline    8.36 8.6 8.7   1X  1X  1X
ser_no_dups             6.73 6.85 6.93 0.804X 0.796X 0.796X
ser_no_dups_full        5.28 5.38 5.55 0.631X 0.625X 0.637X

ser_adjacent_dups_baseline 12.9 13.2 13.4 1X 1X 1X
ser_adjacent_dups          23.2 23.7 24.1 1.8X 1.8X 1.8X
ser_adjacent_dups_full     19.9 20.3 20.7 1.54X 1.54X 1.55X

ser_dups_baseline          9.17 9.54 9.72 1X  1X 1X
ser_dups                7.45 7.69 7.86 0.812X 0.806X 0.809X
ser_dups_full           14.6 15 15.3 1.6X 1.57X 1.57X

deserialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
deser_no_dups_baseline  32.6 33.5 34   1X   1X    1X
deser_no_dups           32.5 33.1 33.7 0.999X 0.99X 0.992X

deser_adjacent_dups_baseline  53.1 54 54.7 1X 1X 1X
deser_adjacent_dups     80.3 81.6  82.5 1.51X 1.51X 1.51X

deser_dups_baseline      52.4 54  54.7  1X  1X   1X
deser_dups               86.8 88.4 89.7 1.66X 1.64X 1.64X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
9 files changed, 655 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/13
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11175/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 8
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 03:40:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#17) to the change originally created by Omid Shahidi. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of a row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we can allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution replaces the type for tuple data
and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.

This patch adds memory estimation in DataStreamSink.java to account
for OutboundRowBatch memory allocation. This patch also removes the
thrift-based serialization because the thrift RPC has been removed
in the prior commit.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark
   to test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in the runtime
   profile
 - Print test line number in PlannerTestBase.java

New row-batch serialization benchmark:

Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize:            10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
   ser_no_dups_base  18.6  18.8  18.9      1X      1X      1X
        ser_no_dups  18.5  18.5  18.8  0.998X  0.988X  0.991X
   ser_no_dups_full  14.7  14.8  14.8  0.793X   0.79X  0.783X

  ser_adj_dups_base  28.2  28.4  28.8      1X      1X      1X
       ser_adj_dups  68.9  69.1  69.8   2.44X   2.43X   2.43X
  ser_adj_dups_full  56.2  56.7  57.1   1.99X      2X   1.99X

      ser_dups_base  20.7  20.9  20.9      1X      1X      1X
           ser_dups  20.6  20.8  20.9  0.994X  0.995X      1X
      ser_dups_full  39.8    40  40.5   1.93X   1.92X   1.94X

deserialize:          10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
 deser_no_dups_base  75.9  76.6    77      1X      1X      1X
      deser_no_dups  74.9  75.6    76  0.987X  0.987X  0.987X

deser_adj_dups_base   127   128   129      1X      1X      1X
     deser_adj_dups   179   193   195   1.41X   1.51X   1.51X

    deser_dups_base   128   128   129      1X      1X      1X
         deser_dups   165   190   193   1.29X   1.48X   1.49X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
122 files changed, 2,875 insertions(+), 2,703 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/17
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 17
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8711/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 21
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 17 Oct 2022 16:54:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/18798/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/2//COMMIT_MSG@10
PS2, Line 10: which will hold the compressed tuple data for  an outbound row batch.
nit: double space "for  an"


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.h@24
PS2, Line 24: #include <vector>
switch with line 22 to reduce diffs.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@713
PS2, Line 713:   for (auto batch : outbound_batches_) {
Use auto& to avoid copying OutboundRowBatch. Should probably delete the copy ctor or make it private too.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@1092
PS2, Line 1092:   if (outbound_rb_mem_pool_ != nullptr) outbound_rb_mem_pool_->FreeAll();
Missing {}


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@1093
PS2, Line 1093:   if (outbound_rb_mem_tracker_ != nullptr) outbound_rb_mem_tracker_->Close();
Missing {}


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/krpc-data-stream-sender.cc@1105
PS2, Line 1105:     krpc_tuple_data_bytes_ =
Probably better to not add these inside the serialize timer scope


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@93
PS2, Line 93:         const_cast<uint8_t*>(reinterpret_cast<const uint8_t*>(tuple_data_)),
shouldn't need const here since tuple_data_ not const


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@106
PS2, Line 106:     pool->Free(const_cast<uint8_t*>(reinterpret_cast<const uint8_t*>(tuple_data_)));
shouldn't need const here since tuple_data_ not const


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@124
PS2, Line 124:   /// TODO: this can probably be a struct
Either add the struct or remove the comment.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.h@584
PS2, Line 584:       vector<int32_t>* tuple_offsets, char* tuple_data);
Maybe less casting if you use uint8_t* here


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc@340
PS2, Line 340: Status RowBatch::Serialize(bool full_dedup, DedupMap* distinct_tuples,
Add comment explaining why SerializeThrift was added.


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc@396
PS2, Line 396: Status RowBatch::Serialize_Thrift(bool full_dedup, vector<int32_t>* tuple_offsets,
Rename to SerializeThrift


http://gerrit.cloudera.org:8080/#/c/18798/2/be/src/runtime/row-batch.cc@471
PS2, Line 471:     vector<int32_t>* tuple_offsets, char* tuple_data) {
Maybe less casting if you use uint8_t* here



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Aug 2022 22:03:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 291 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for  an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 278 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 2
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11481/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 15
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Sep 2022 04:41:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

serialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
ser_no_dups_baseline    8.36 8.6 8.7   1X  1X  1X
ser_no_dups             6.73 6.85 6.93 0.804X 0.796X 0.796X
ser_no_dups_full        5.28 5.38 5.55 0.631X 0.625X 0.637X

ser_adjacent_dups_baseline 12.9 13.2 13.4 1X 1X 1X
ser_adjacent_dups          23.2 23.7 24.1 1.8X 1.8X 1.8X
ser_adjacent_dups_full     19.9 20.3 20.7 1.54X 1.54X 1.55X

ser_dups_baseline          9.17 9.54 9.72 1X  1X 1X
ser_dups                7.45 7.69 7.86 0.812X 0.806X 0.809X
ser_dups_full           14.6 15 15.3 1.6X 1.57X 1.57X

deserialize:
Func                    10%  50%  90%  10%  50%  90% ile
                                      (rel) (rel) (rel)
-----------------------------------------------------------
deser_no_dups_baseline  32.6 33.5 34   1X   1X    1X
deser_no_dups           32.5 33.1 33.7 0.999X 0.99X 0.992X

deser_adjacent_dups_baseline  53.1 54 54.7 1X 1X 1X
deser_adjacent_dups     80.3 81.6  82.5 1.51X 1.51X 1.51X

deser_dups_baseline      52.4 54  54.7  1X  1X   1X
deser_dups               86.8 88.4 89.7 1.66X 1.64X 1.64X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
9 files changed, 656 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc@30
PS10, Line 30: kudu/kudu-util.h"
> I did not update my local tree. May be other commits made change and move t
ignore this comment. The directory structure was tidied up by IMPALA-10800.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 22:00:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11178/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 9
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 04:14:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/runtime/row-batch.inline.h
File be/src/runtime/row-batch.inline.h:

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/runtime/row-batch.inline.h@22
PS7, Line 22: 
extra line


http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/runtime/row-batch.inline.h@34
PS7, Line 34: for_compression
These three lines check for_compression, you can change code as:
if (for_compression) {
...
} else {
...
}


http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test
File testdata/workloads/tpch/queries/datastream-sender.test:

http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test@1
PS7, Line 1: ====
This file should be put in testdata/workloads/functional-query/queries


http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test@49
PS7, Line 49:   
extra space



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Aug 2022 06:37:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11188/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 22:46:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 11: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8468/


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 02:44:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11611/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 16
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Oct 2022 04:11:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc@753
PS4, Line 753:   // on some channel
> I think it will make more sense to do this in Close()
It works.


http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h@422
PS5, Line 422: us Serialize(OutboundRowBatch* output_batch, LockingFreePool* perm_free_pool,
             :       RuntimeProfile::SummaryStatsCounter* tuple_data_stats_counter,
We can return counters as integer, then update the profile counter in the caller. This will make unit-test code row-batch-serialize-test.cc easier. 
Same comment for line #531.


http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h@577
PS5, Line 577:   /// row batch). If the distinct_tuples argument is non-null, full deduplication is
nit: extra line



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 3
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 23:51:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 15:

(5 comments)

Agree that we should clean up Thrift RPC related code at this point.

I have discussion with Kurt. He has an alternative idea to use std::basic_string + custom tracked allocator. This is potentially lighter than using BufferPool. An example of such allocator is MemTrackerAllocator from be/src/kudu/util/mem_tracker.h. I'm currently investigating this solution.

In the meantime, I'm leaving some note to fix.

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc@780
PS15, Line 780:   krpc_tuple_data_bytes_ =
              :       ADD_SUMMARY_STATS_COUNTER(profile(), "TupleDataBytes", TUnit::BYTES);
              :   krpc_compression_scratch_bytes_ =
              :       ADD_SUMMARY_STATS_COUNTER(profile(), "CompressionScratchBytes", TUnit::BYTES);
I was suggesting this counter addition for research purpose. Wonder if we should drop this now. TupleDataBytes seems to overlap with UncompressedRowBatchSize.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc@786
PS15, Line 786: "Memory tracker for OutBoundRowBatch serialization", parent_mem_tracker
We should nest MemTracker under the KrpcDataStreamSender's mem_tracker_.get() here. Also can use shorter MemTracker name.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/krpc-data-stream-sender.cc@1116
PS15, Line 1116: dest->SetMemAllocator(outbound_rb_free_pool_);
Can be set during Prepare and Init method rather than here.


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.cc@321
PS15, Line 321: RETURN_IF_ERROR(output_batch->AllocateTraceableBuffer(size, false));
In the original code, we throw TErrorCode::ROW_BATCH_TOO_LARGE if size is larger than numeric_limits<int32_t>::max().


http://gerrit.cloudera.org:8080/#/c/18798/15/be/src/runtime/row-batch.cc@365
PS15, Line 365: RETURN_IF_ERROR(output_batch->AllocateTraceableBuffer(compressed_size, true));
In the original code, we do not resize here if current length is longer than expected compressed_size.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 15
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Oct 2022 23:51:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 18: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Oct 2022 06:12:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 11: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 22:28:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/11/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/11/be/src/runtime/row-batch.h@122
PS11, Line 122:     }
add DCHECK(mem_allocator_ == locking_free_pool);



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 15:15:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h@104
PS5, Line 104: Free(LockingFreePool* pool)
Could we change static function AllocateTraceableBuffer(), which is defined in row-batch.cc, as member function of this class? It make code easier to read.
Also define a member variable to save LockingFreePool pointer when allocating buffer so that we don't need to pass pool in Free() function.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Aug 2022 21:35:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
4 files changed, 302 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@299
PS6, Line 299:     int64_t* tuple_data_stats_counter, int64_t* compression_scratch_stats_counter, bool full_dedup) {
line too long (101 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Aug 2022 21:00:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG@18
PS15, Line 18: The thrift implementation is left unchanged and
             : the protobuf implementation is seperated.
Should we remove those code? They are not used anymore.


http://gerrit.cloudera.org:8080/#/c/18798/15//COMMIT_MSG@20
PS15, Line 20: 
Also mention the code change in Planner.


http://gerrit.cloudera.org:8080/#/c/18798/15/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java:

http://gerrit.cloudera.org:8080/#/c/18798/15/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@64
PS15, Line 64: is
nit: are


http://gerrit.cloudera.org:8080/#/c/18798/15/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java@75
PS15, Line 75: int outboundBatchesPerReceiver = 2;
             :     int bufferPerOutboundBatch = 2;
Could you add a comment why set these two variables as 2?



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 15
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Oct 2022 18:51:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 6:

(12 comments)

> Patch Set 6:
> 
> (8 comments)
> 
> add end-end unit test

added end-to-end test

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc@138
PS6, Line 138:       uint8_t* input = const_cast<uint8_t*>(
> const_cast shouldn't be needed here
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc@140
PS6, Line 140:       uint8_t* compressed_output = const_cast<uint8_t*>(
> const_cast shouldn't be needed here
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/benchmarks/row-batch-serialize-benchmark.cc@437
PS6, Line 437: argc, argv, true
> add impala::TestInfo::BE_TEST
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/krpc-data-stream-sender.h@30
PS6, Line 30: #include "exec/data-sink.h"
> Move this back where it was before
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@71
PS6, Line 71:   std::unique_ptr<FreePool> free_pool_;
> Is unique_lock actually required here? Seem the locking here is fairly simp
From our offline discussion, we can use lock_guard which has zero overhead and only maintains one state


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@105
PS6, Line 105: mem_allocator_->Free(reinterpret_cast<uint8_t*>(tuple_data_))
> Should we check if tuple_data_ is not nullptr before calling Free()?
free-pool.h:99 checks this for us in the free pool free() function


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@114
PS6, Line 114: inline Status AllocateTraceableBuffer
> Add a new header file row-ratch.inline.h and put this inline function body 
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.h@115
PS6, Line 115:  char** buffer_ptr, int64_t* buffer_length, int64_t* buffer_capacity
> Define two allocation functions to replace AllocateTraceableBuffer(), one f
Changed function signature which now gets passed the size and a boolean flag (for_compression) and changes the correct buffer and its length and capacity accordingly


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@299
PS6, Line 299:     int64_t* tuple_data_stats_counter, int64_t* compression_scratch_stats_counter, bool full_dedup) {
> line too long (101 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@306
PS6, Line 306: // bool full_dedup = UseFullDedup();
> delete
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@322
PS6, Line 322: &output_batch->tuple_data_,
             :       &output_batch->tuple_data_length_, &output_batch->tuple_data_capacity_
> change function signature so that we don't need to pass member variables as
Done


http://gerrit.cloudera.org:8080/#/c/18798/6/be/src/runtime/row-batch.cc@393
PS6, Line 393: Thrift implementation for Serialization using TRowBatch.
             : /// Benchmarks (be/src/benchmarks/row-batch-serialize-benchmark.cc) and tests
             : /// (be/src/runtime/row-batch-serialize-test.cc) for serialization use TRowBatch and
             : /// Thrift so we need to keep the old implementation so they don't fail.
> You already replace TRowBatch in those two files, update the comments accor
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 6
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Aug 2022 03:01:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 7:

(5 comments)

> Patch Set 7:
> 
> (4 comments)

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/benchmarks/row-batch-serialize-benchmark.cc@38
PS7, Line 38: 
            : // Benchmark to measure how quickly we can serialize and deserialize row batches. More
            : // specifically, this benchmark was developed to measure the overhead of deduplication.
            : // The benchmarks are divided into serialization and deserialization benchmarks.
            : // The serialization benchmarks test different serialization methods (the new default of
            : // adjacent deduplication vs. the baseline of no deduplication) on row batches with
            : // different patterns of duplication: no_dups and adjacent_dups.
            : // For all benchmarks we use (int, string) tuples to exercise both variable-length and
            : // fixed-length slot handling. The small tuples with few slots emphasizes per-tuple
            : // dedup performance rather than per-slot serialization/deserialization performance.
            : //
            : // serialize:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //          ser_no_dups_baseline               17.43                  1X
            : //                   ser_no_dups               17.33             0.9944X
            : //              ser_no_dups_full                14.1             0.8092X
            : //
            : //    ser_adjacent_dups_baseline               26.65                  1X
            : //             ser_adjacent_dups               63.98                2.4X
            : //        ser_adjacent_dups_full               55.88              2.096X
            : //
            : //             ser_dups_baseline               19.26                  1X
            : //                      ser_dups               19.55              1.015X
            : //                 ser_dups_full                32.4              1.682X
            : //
            : // deserialize:          Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //        deser_no_dups_baseline               64.94                  1X
            : //                 deser_no_dups               69.24              1.066X
            : //
            : //  deser_adjacent_dups_baseline                 112                  1X
            : //           deser_adjacent_dups               207.4              1.852X
            : //
            : //           deser_dups_baseline               114.8                  1X
            : //                    deser_dups               208.5              1.817X
            : //
            : // Earlier results with LossyHashTable
            : // serialize:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //             ser_no_dups_lossy               15.93             0.9139X
            : //       ser_adjacent_dups_lossy               58.21              2.184X
            : //                ser_dups_lossy               50.46               2.62X
            : //
            : // Earlier results with boost::unordered_map
            : // serialize:            Function     Rate (iters/ms)          Comparison
            : // ----------------------------------------------------------------------
            : //              ser_no_dups_full                8.73             0.5582X
            : //
            : //        ser_adjacent_dups_full                38.7              1.634X
            : //
            : //                 ser_dups_full                27.5               1.54X
Should this be removed and updated with the current benchmark scores?


http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/runtime/row-batch.inline.h
File be/src/runtime/row-batch.inline.h:

http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/runtime/row-batch.inline.h@22
PS7, Line 22: 
> extra line
Done


http://gerrit.cloudera.org:8080/#/c/18798/7/be/src/runtime/row-batch.inline.h@34
PS7, Line 34: for_compression
> These three lines check for_compression, you can change code as:
Done


http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test
File testdata/workloads/tpch/queries/datastream-sender.test:

http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test@1
PS7, Line 1: ====
> This file should be put in testdata/workloads/functional-query/queries
Created a query for functional-query workload and a query for tpch workload in their respective directories


http://gerrit.cloudera.org:8080/#/c/18798/7/testdata/workloads/tpch/queries/datastream-sender.test@49
PS7, Line 49:   
> extra space
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 01:14:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 8:

Added benchmark results to commit message for patch 9


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 8
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 03:49:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 18:

(3 comments)

Since we removed TRowBatch-based serialization, I tried to merge Serialize with SerializeInternal in patch set 18. This seems to improve benchmark a bit.

http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h@46
PS17, Line 46: class Tuple;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h@493
PS17, Line 493:   /// Implementation for protobuf to deserialize a row batch.
              :   ///
              :   /// 'input_tuple_offsets': an int32_t array of tuples; offsets into 'input_tuple_data'.
              :   /// Used for populating the tuples in the row batch with actual pointers.
              :   ///
              :   /// 'input_tuple_data': contains pointer and size of tuples' data buffer.
              :   /// If 'is_compressed' is true, the data is compressed.
              :   ///
              :   /// 'uncompressed_size': the uncompressed size of 'input_tuple_data' if it's compressed.
              :   ///
              :   /// 'is_compressed': True if 'input_tuple_data' is compressed.
              :   ///
              :   /// 'tuple_data': buffer of 'uncompressed_size' bytes for holding tuple data.
              :   void Deserialize(const kudu::Slice& input_tuple_offsets,
              :       const kudu::Slice& input_tuple_data, int64_t uncompressed_size, bool is_compressed,
              :       uint8_t* tuple_data);
              : 
> remove
Done


http://gerrit.cloudera.org:8080/#/c/18798/17/be/src/runtime/row-batch.h@511
PS17, Line 511: string and collection data). This is the size of the row batch after removing
> remove
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 18
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Oct 2022 05:37:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
9 files changed, 627 insertions(+), 173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 7
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc@1095
PS13, Line 1095: if (outbound_rb_mem_pool_.get() != nullptr)
possibly change to if(UNLIKELY(outbound_rb_mem_pool.get() != nullptr) 

same comment for line 1095


http://gerrit.cloudera.org:8080/#/c/18798/13/be/src/runtime/krpc-data-stream-sender.cc@1103
PS13, Line 1103: delete outbound_rb_free_pool_;
check if outbound_rb_free_pool_ != nullptr



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 22:01:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18798/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/9//COMMIT_MSG@29
PS9, Line 29: 
Add a title here: New row-batch serialization benchmark


http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc@30
PS10, Line 30: kudu/kudu-util.h"
Why make this change? There is no exec/kudu directory.



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 21:39:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18798/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/10//COMMIT_MSG@29
PS10, Line 29: 
> Add a title here
Done


http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/10/be/src/runtime/krpc-data-stream-sender.cc@30
PS10, Line 30: kudu/kudu-util.h"
> ignore this comment. The directory structure was tidied up by IMPALA-10800.
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 22:24:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 11: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 23:04:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11634/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 04:55:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 19: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 05:48:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 19:

> Patch Set 19: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8703/

3 e2e tests failed with mem_limit exceeded in dockerised environment.
I will try to increase the mem_limit and retry the tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 19
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Oct 2022 11:07:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/18798/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/2//COMMIT_MSG@20
PS2, Line 20: 
could you add a testing section to list what are tests have been done for this patch?


http://gerrit.cloudera.org:8080/#/c/18798/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18798/4//COMMIT_MSG@20
PS4, Line 20: 
Please add a Testing section which list the tests I did for this patch, like performance test, manual real cluster test, precommit core test, new unit test, etc.


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.h
File be/src/runtime/krpc-data-stream-sender.h:

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.h@273
PS4, Line 273: boost::scoped_ptr
For new code, we prefer to use std::unique_ptr, instead of boost::scoped_ptr.
Same comments for next two variables.


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc@753
PS4, Line 753:   // on some channel
reset outbound_rb_mem_tracker_, outbound_rb_mem_pool_ and outbound_rb_free_pool_ as nullptr to free the objects.


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/krpc-data-stream-sender.cc@1108
PS4, Line 1108:   krpc_tuple_data_bytes_ =
              :       ADD_SUMMARY_STATS_COUNTER(profile(), "TupleDataBytes", TUnit::BYTES);
              :   krpc_compression_scratch_bytes_ =
              :       ADD_SUMMARY_STATS_COUNTER(profile(), "CompressionScratchBytes", TUnit::BYTES);
Should we move these two lines to function KrpcDataStreamSender::Prepare() as other counters?


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.h@513
PS4, Line 513: tuple_offsets
Don't see this parameter in function Serialize() signature. It is used for SerializeThrift


http://gerrit.cloudera.org:8080/#/c/18798/3/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/3/be/src/runtime/row-batch.h@514
PS3, Line 514:                
Don't see this parameter


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc@300
PS4, Line 300: As part of the serialization process we deduplicate tuples to avoid serializing a
             :   // Tuple multiple times for the RowBatch. By default we only detect duplicate tuples
             :   // in adjacent rows only. If full deduplication is enabled, we will build a
             :   // map to detect non-adjacent duplicates. Building this map comes with significant
             :   // overhead, so is only worthwhile in the uncommon case of many non-adjacent duplicates.
The comments is duplicated with the one in line 347. Could you make one shorter?


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc@322
PS4, Line 322:  if (full_dedup) {
             :     RETURN_IF_ERROR(
             :         Serialize(full_dedup, &distinct_tuples, output_batch, &uncompressed_size,
             :             &is_compressed, size, perm_free_pool, compression_scratch_stats_counter));
             :   } else {
             :     RETURN_IF_ERROR(Serialize(full_dedup, nullptr, output_batch, &uncompressed_size,
             :         &is_compressed, size, perm_free_pool, compression_scratch_stats_counter));
             :   }
Simplify as:
RETURN_IF_ERROR(Serialize(full_dedup, full_dedup ? &distinct_tuples : nullptr, output_batch, &uncompressed_size,
&is_compressed, size, perm_free_pool, compression_scratch_stats_counter));


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc@404
PS4, Line 404: expected to be removed once migration from Thrift RPC to KRPC is done.
I think we already migrate from Thrift RPC to KRPC for data communication between backends. But code may not be fully cleaned.


http://gerrit.cloudera.org:8080/#/c/18798/4/be/src/runtime/row-batch.cc@416
PS4, Line 416: F
nit: low case



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 4
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 05:34:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.h@645
PS5, Line 645: compression_scratch_;
This variable seems to be used by SerializeThrift(), not used for protobuf serialization. Please update comment.


http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/18798/5/be/src/runtime/row-batch.cc@258
PS5, Line 258: FreePool
LockingFreePool



-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 5
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Aug 2022 21:23:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of a row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using MemTrackerAllocator, we can allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution replaces the type for tuple data
and compression scratch from std::string to TrackedString, an
std:basic_string with MemTrackerAllocator as the custom allocator.

This patch adds memory estimation in DataStreamSink.java to account
for OutboundRowBatch memory allocation. This patch also removes the
thrift-based serialization because the thrift RPC has been removed
in the prior commit.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark
   to test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add test_datastream_sender.py to verify the peak memory of EXCHANGE
   SENDER node.
 - Raise mem_limit in two of test_spilling_large_rows test case.
 - Print test line number in PlannerTestBase.java

New row-batch serialization benchmark:

Machine Info: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
serialize:            10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
   ser_no_dups_base  18.6  18.8  18.9      1X      1X      1X
        ser_no_dups  18.5  18.5  18.8  0.998X  0.988X  0.991X
   ser_no_dups_full  14.7  14.8  14.8  0.793X   0.79X  0.783X

  ser_adj_dups_base  28.2  28.4  28.8      1X      1X      1X
       ser_adj_dups  68.9  69.1  69.8   2.44X   2.43X   2.43X
  ser_adj_dups_full  56.2  56.7  57.1   1.99X      2X   1.99X

      ser_dups_base  20.7  20.9  20.9      1X      1X      1X
           ser_dups  20.6  20.8  20.9  0.994X  0.995X      1X
      ser_dups_full  39.8    40  40.5   1.93X   1.92X   1.94X

deserialize:          10%   50%   90%     10%     50%     90%
                                        (rel)   (rel)   (rel)
-------------------------------------------------------------
 deser_no_dups_base  75.9  76.6    77      1X      1X      1X
      deser_no_dups  74.9  75.6    76  0.987X  0.987X  0.987X

deser_adj_dups_base   127   128   129      1X      1X      1X
     deser_adj_dups   179   193   195   1.41X   1.51X   1.51X

    deser_dups_base   128   128   129      1X      1X      1X
         deser_dups   165   190   193   1.29X   1.48X   1.49X

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Reviewed-on: http://gerrit.cloudera.org:8080/18798
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/mem-tracker.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M common/thrift/Results.thrift
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/result-spooling.test
M testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q01.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q02.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q03.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q07.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q08.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q10a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q13.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q15.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q16.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q17.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q21.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q23b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q24b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q25.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q26.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q28.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q29.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q30.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q32.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q33.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q34.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q35a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q36.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q38.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q40.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q41.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q46.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q47.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q48.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q51.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q52.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q53.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q56.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q57.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q58.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q59.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q60.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q61.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q62.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q63.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q65.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q68.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q69.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q70.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q72.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q73.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q75.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q76.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q77.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q78.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q79.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q80.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q81.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q83.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q85.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q86.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q87.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q88.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q89.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q90.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q91.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q92.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q93.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q94.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q95.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q97.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q99.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-query/queries/QueryTest/dedicated-coord-mem-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-large-rows.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
124 files changed, 2,870 insertions(+), 2,743 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 22
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-6684: Fix untracked memory in KRPC

Posted by "Omid Shahidi (Code Review)" <ge...@cloudera.org>.
Omid Shahidi has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18798 )

Change subject: IMPALA-6684: Fix untracked memory in KRPC
......................................................................

IMPALA-6684: Fix untracked memory in KRPC

During serialization of an row batch header, a tuple_data_ is created
which will hold the compressed tuple data for an outbound row batch.
We would like this tuple data to be trackable as it is responsible for
a significant portion of untrackable memory from the krpc data stream
sender. By using free pool, we are able to allocate tuple data and
compression scratch and account for it in the memory tracker of the
KrpcDataStreamSender. This solution creates a RAII class responsible
for memory allocation and changes the existing code to use a char buffer
pointed by a char* tuple_data_ instead of the previously used
std::string tuple_data_. The thrift implementation is left unchanged and
the protobuf implementation is seperated.

Testing:
 - Passed core tests.
 - Ran a single node benchmark which shows no regression.
 - Updated row-batch-serialize-test and row-batch-serialize-benchmark to
   test the row-batch serialization used by KRPC.
 - Manually collected query-profile, heap growth, and memory usage log
   showing untracked memory decreased by 1/2.
 - Add end-end unit-test to verify the new counters in runtime profile

Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
A be/src/runtime/row-batch.inline.h
A testdata/workloads/functional-query/queries/datastream-sender.test
A testdata/workloads/tpch/queries/datastream-sender.test
A tests/query_test/test_datastream_sender.py
10 files changed, 657 insertions(+), 173 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/18798/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18798
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ba2b907ce4f275a7a1fb8cf75453c7003eb4b82
Gerrit-Change-Number: 18798
Gerrit-PatchSet: 8
Gerrit-Owner: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Omid Shahidi <om...@gmail.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>