You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2019/03/25 23:59:23 UTC

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12848

to review the following change.


Change subject: IMPALA-7095: clean up scan node profiles
......................................................................

IMPALA-7095: clean up scan node profiles

Add counters to scan node implementations where they make sense but were
missing (e.g. row batch queue counters for multithread Kudu scans) and
remove them where they don't make sense (e.g. scanner thread counters
for non-multithreaded scans).

Refactors the multithreaded Kudu and HDFS scans to share logic via
composition (single inheritance doesn't work for this case),
which enables the same set of counters to be maintained with shared
code. The row batch queueing and thread tracking is now shared. I looked
at combining the logic around 'status_', 'lock_' and 'done_' between the
two but the details were different enough that it didn't seem worth
abstracting.

Adds a PeakScannerThreadConcurrency counter - this answers a common
question.

Fixes RowsRead for data source scans.

Fix some of the comments to be more accurate/useful.

Testing:
Ran exhaustive tests. Ran various types of scans (HDFS, Kudu, HBase,
Data source) and inspected the profile output manually.

Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Reviewed-on: http://gerrit.cloudera.org:8080/10810
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-scan-node.h
M be/src/exec/hbase-table-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/runtime/fragment-instance-state.cc
M be/src/util/blocking-queue.h
M be/src/util/thread.h
20 files changed, 496 insertions(+), 357 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/12848/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12848 )

Change subject: IMPALA-7095: clean up scan node profiles
......................................................................


Patch Set 1:

Conflicts: be/src/exec/data-source-scan-node.cc

 
334   while (true) {
335     {
336       SCOPED_TIMER(materialize_tuple_timer());
337       // copy rows until we hit the limit/capacity or until we exhaust input_batch_
338       while (!ReachedLimit() && !row_batch->AtCapacity() && InputBatchHasNext()) {
339 <<<<<<< HEAD
340         RETURN_IF_ERROR(MaterializeNextRow(tuple_pool, tuple));
341 =======
342         RETURN_IF_ERROR(MaterializeNextRow(state->local_time_zone(), tuple_pool, tuple));
343         ++rows_read;
344 >>>>>>> 5d67245... IMPALA-7095: clean up scan node profiles
345         int row_idx = row_batch->AddRow();
346         TupleRow* tuple_row = row_batch->GetRow(row_idx);
347         tuple_row->SetTuple(tuple_idx_, tuple);
348
349         if (ExecNode::EvalConjuncts(evals, num_conjuncts, tuple_row)) {
350           row_batch->CommitLastRow();
351           tuple = reinterpret_cast<Tuple*>(
352               reinterpret_cast<uint8_t*>(tuple) + tuple_desc_->byte_size());
353           ++num_rows_returned_;
354         }
355         ++next_row_idx_;
356       }
357       if (ReachedLimit() || row_batch->AtCapacity() || input_batch_->eos) {
358         *eos = ReachedLimit() || input_batch_->eos;
359         COUNTER_SET(rows_returned_counter_, num_rows_returned_);
360         COUNTER_ADD(rows_read_counter_, rows_read);
361         return Status::OK();
362       }
363     }

How to fix:
Use the old usage of "MaterializeNextRow(tuple_pool, tuple))":

 MaterializeNextRow(tuple_pool, tuple);
 ++rows_read;


-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 00:02:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12848 )

Change subject: IMPALA-7095: clean up scan node profiles
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2538/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 00:41:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12848 )

Change subject: IMPALA-7095: clean up scan node profiles
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 01:19:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12848 )

Change subject: IMPALA-7095: clean up scan node profiles
......................................................................


Patch Set 1: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 05:46:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12848 )

Change subject: IMPALA-7095: clean up scan node profiles
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3955/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 01:55:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7095: clean up scan node profiles

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12848 )

Change subject: IMPALA-7095: clean up scan node profiles
......................................................................

IMPALA-7095: clean up scan node profiles

Add counters to scan node implementations where they make sense but were
missing (e.g. row batch queue counters for multithread Kudu scans) and
remove them where they don't make sense (e.g. scanner thread counters
for non-multithreaded scans).

Refactors the multithreaded Kudu and HDFS scans to share logic via
composition (single inheritance doesn't work for this case),
which enables the same set of counters to be maintained with shared
code. The row batch queueing and thread tracking is now shared. I looked
at combining the logic around 'status_', 'lock_' and 'done_' between the
two but the details were different enough that it didn't seem worth
abstracting.

Adds a PeakScannerThreadConcurrency counter - this answers a common
question.

Fixes RowsRead for data source scans.

Fix some of the comments to be more accurate/useful.

Testing:
Ran exhaustive tests. Ran various types of scans (HDFS, Kudu, HBase,
Data source) and inspected the profile output manually.

Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Reviewed-on: http://gerrit.cloudera.org:8080/10810
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/12848
---
M be/src/exec/data-source-scan-node.cc
M be/src/exec/data-source-scan-node.h
M be/src/exec/hbase-scan-node.cc
M be/src/exec/hbase-scan-node.h
M be/src/exec/hbase-table-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-scan-node-mt.cc
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/scan-node.cc
M be/src/exec/scan-node.h
M be/src/runtime/fragment-instance-state.cc
M be/src/util/blocking-queue.h
M be/src/util/thread.h
20 files changed, 496 insertions(+), 357 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/12848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I77286282d42e7764bfdf94c7ec47cec9d743f787
Gerrit-Change-Number: 12848
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>