You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2017/03/08 01:32:14 UTC

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Joe McDonnell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6312

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version of KuduScanNode
that materializes the tuples in GetNext(). KuduScanNodeMt is enabled by the
same condition as HdfsScanNodeMt: mt_dop is greater than 1.

To share code between the two implementations, KuduScanNode and KuduScanNodeMt
are now subclasses of KuduScanNodeBase, which implements the shared code.
The KuduScanner is minimally impacted, as it already had the required
GetNext interface.

The testing for this is a modified version of kudu-scan-node.test run with
various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
15 files changed, 515 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6312

to look at the new patch set (#6).

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 558 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6312/1//COMMIT_MSG
Commit Message:

Line 11: same condition as HdfsScanNodeMt: mt_dop is greater than 1.
>= 1?


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

Line 57:     : ScanNode(pool, tnode, descs),
is this standard indentation?


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

Line 79:   /// The Kudu client and table. Scanners share these instances.
how much overhead does not sharing this imply? (how much memory/how many threads sit behind these structures?)

if it's a lot, we could share this across fragment instances in the query state.


Line 90:   /// The next index in 'scan_tokens_' to be assigned. Protected by lock_.
there is no lock_ here


Line 93:   RuntimeProfile::Counter* kudu_round_trips_;
these are still in KuduScanNode


Line 102:   RuntimeProfile::Counter* kudu_round_trips() const { return kudu_round_trips_; }
why is this useful? since it's private, it's only accessible to this class anyway.


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-mt.cc
File be/src/exec/kudu-scan-node-mt.cc:

Line 60:   if (scanner_ == nullptr) {
why not do this in open?


Line 81:     scanner_.reset();
why reset here?


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-mt.h
File be/src/exec/kudu-scan-node-mt.h:

Line 45:   std::unique_ptr<KuduScanner> scanner_;
the separation of scannode and scanner exists today because of the multi-threading in the scannode. for the mt version, that's not needed anymore.

let's either merge the scanner into this class now, or leave a todo to do that when we get rid of the non-mt version.

the advantage is that you remove a level of indirection and get a chance to remove other overhead (example: right now, you create a copy of the conjunct contexts)


http://gerrit.cloudera.org:8080/#/c/6312/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 141:     // Determine backend scan node implementation to use. The optimized MT implementation
fix comment


http://gerrit.cloudera.org:8080/#/c/6312/1/tests/query_test/test_mt_dop.py
File tests/query_test/test_mt_dop.py:

Line 102: # Parquet filtering test rlies on a specific mt_dop value, so keep in its own test
spelling


Line 118: # Parquet filtering test rlies on a specific mt_dop value, so keep in its own test
spelling


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

Line 105:   KUDU_RETURN_IF_ERROR(client_->OpenTable(table_desc->table_name(), &table_),
does this need to be done query-wide? how much data/work sits behind this call?


http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

Line 75:   const TupleDescriptor* tuple_desc() const { return tuple_desc_; }
functions, even getters, at bottom


Line 79:   kudu::client::KuduClient *client_;
formatting


Line 80:   kudu::client::KuduClient* kudu_client() { return client_; }
also move down


Line 82:   /// Kudu table reference. Shared between scanner threads.
between scanner threads of KuduScanNode?


http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 119:       kudu::client::KuduClient **client);
who owns this?


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 2:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6312/1//COMMIT_MSG
Commit Message:

Line 11: same condition as HdfsScanNodeMt: mt_dop is greater than 1.
> >= 1?
Done


http://gerrit.cloudera.org:8080/#/c/6312/2//COMMIT_MSG
Commit Message:

Line 1: Parent:     def5355a (IMPALA-3403: [DOCS] Pare back irrelevant installation info)
> nit: can you wrap lines in git commits at 60 chars? formatting in git tools
Done


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

Line 57:     : ScanNode(pool, tnode, descs),
> is this standard indentation?
Yes, this is what is listed in the Google C++ style guide.


http://gerrit.cloudera.org:8080/#/c/6312/2/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

PS2, Line 122: Base
> remove Base for the debugging output
Done


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

Line 35: /// are used to retrieve the rows for this scan.
> You should make this comment more specific to KuduScanNodeBase, eg. "Base c
Improved the comment.


Line 79:   /// The Kudu client and table. Scanners share these instances.
> how much overhead does not sharing this imply? (how much memory/how many th
Moved the KuduClient to the QueryState.


Line 90:   /// The next index in 'scan_tokens_' to be assigned. Protected by lock_.
> there is no lock_ here
Done


Line 93:   RuntimeProfile::Counter* kudu_round_trips_;
> these are still in KuduScanNode
Done


PS1, Line 98: //
> ///
Done


Line 102:   RuntimeProfile::Counter* kudu_round_trips() const { return kudu_round_trips_; }
> why is this useful? since it's private, it's only accessible to this class 
KuduScanner is a friend class, so it accesses these elements. In this upload, I have KuduScanner avoid direct field accesses, but I don't have a strong preference. This could be removed (along with tuple_desc() and kudu_client()) and KuduScanner could access the fields directly.


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-mt.cc
File be/src/exec/kudu-scan-node-mt.cc:

Line 60:   if (scanner_ == nullptr) {
> why not do this in open?
Moved to open.


Line 81:     scanner_.reset();
> why reset here?
Now that I think about it, I don't think it is needed. If the caller gets an error, it will call Close.


http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-mt.h
File be/src/exec/kudu-scan-node-mt.h:

Line 45:   std::unique_ptr<KuduScanner> scanner_;
> the separation of scannode and scanner exists today because of the multi-th
Added a comment.


http://gerrit.cloudera.org:8080/#/c/6312/2/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

PS2, Line 79: 
            :   RuntimeProfile::Counter* kudu_round_trips_;
            :   RuntimeProfile::Counter* kudu_remote_tokens_;
            :   static const std::string KUDU_ROUND_TRIPS;
            :   static const std::string KUDU_REMOTE_TOKENS;
> remove; aren't these in the base class?
Done


http://gerrit.cloudera.org:8080/#/c/6312/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 141:     // Determine backend scan node implementation to use. The optimized MT implementation
> fix comment
Done


http://gerrit.cloudera.org:8080/#/c/6312/1/tests/query_test/test_mt_dop.py
File tests/query_test/test_mt_dop.py:

Line 102: # Parquet filtering test rlies on a specific mt_dop value, so keep in its own test
> spelling
This will go away when I rebase.


Line 118: # Parquet filtering test rlies on a specific mt_dop value, so keep in its own test
> spelling
Removed comment. It was wrong anyway.


http://gerrit.cloudera.org:8080/#/c/6312/2/tests/query_test/test_mt_dop.py
File tests/query_test/test_mt_dop.py:

PS2, Line 102: rlies
> typo
This will be gone when I rebase.


PS2, Line 118: # Parquet filtering test rlies on a specific mt_dop value, so keep in its own test
> update comment
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

Line 35: /// are used to retrieve the rows for this scan.
You should make this comment more specific to KuduScanNodeBase, eg. "Base class for single and multi-threaded versions of..." and "TODO: remove this once all scanners support multithreaded execution..."


PS1, Line 98: //
///


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6312/4/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 32: namespace kudu {
format into single line


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/377/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/376/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 6: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/377/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has uploaded a new patch set (#3).

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 547 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

Line 105:   KUDU_RETURN_IF_ERROR(client_->OpenTable(table_desc->table_name(), &table_),
> looks like this is a negligible amount of state, so no worries.
I had asked Matt, and he said it was lightweight.


http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

Line 75:   const TupleDescriptor* tuple_desc() const { return tuple_desc_; }
> functions, even getters, at bottom
Done


Line 79:   kudu::client::KuduClient *client_;
> formatting
Done


Line 80:   kudu::client::KuduClient* kudu_client() { return client_; }
> also move down
Done


Line 82:   /// Kudu table reference. Shared between scanner threads.
> between scanner threads of KuduScanNode?
Updated comment to reflect that this is for KuduScanNode.


http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 119:       kudu::client::KuduClient **client);
> who owns this?
Added info to the comment. All the KuduClients are owned by the QueryState.


http://gerrit.cloudera.org:8080/#/c/6312/4/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 32: 
> format into single line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has uploaded a new patch set (#5).

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 558 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 6:

Rebased and fixed kudu-scan-node.cc to handle the case when there are no scan tokens.

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 21: #include <kudu/client/client.h>
> it would be really nice if we could avoid including the kudu header here. I
I'm looking into this. The sp::shared_ptr type comes from the KuduClientBuilder::Build call in CreateKuduClient. I think it is not guaranteed to be safe for us to transfer something from a shared_ptr to unique_ptr. There is no way for us to tell if this is the only shared_ptr to that object.


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 3:

(1 comment)

haven't had a chance to look at most of the new code yet, but one quick comment before I run off to a few hours of meetings...

http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 21: #include <kudu/client/client.h>
it would be really nice if we could avoid including the kudu header here. I think you may be able to forward declare KuduClient if you store std::unique_ptr in the map. I think that'd be better than keeping a shared_ptr anyway, since we return the naked ptrs and should avoid shared_ptrs wherever possible (plus it'll avoid leaking Kudu's 'shaded' sp::shared_ptr type).

see http://howardhinnant.github.io/incomplete.html


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Reviewed-on: http://gerrit.cloudera.org:8080/6312
Reviewed-by: Marcel Kornacker <ma...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 563 insertions(+), 143 deletions(-)

Approvals:
  Marcel Kornacker: Looks good to me, approved
  Impala Public Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 3:

(1 comment)

This upload changes the query-state.h/cc to use an opaque type and avoid including the Kudu headers in query-state.h.

http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

Line 21: #include <kudu/client/client.h>
> I'm looking into this. The sp::shared_ptr type comes from the KuduClientBui
Changed this to use an opaque type.


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 7:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/389/

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has uploaded a new patch set (#4).

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 561 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6312/3/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

Line 105:   KUDU_RETURN_IF_ERROR(client_->OpenTable(table_desc->table_name(), &table_),
> does this need to be done query-wide? how much data/work sits behind this c
looks like this is a negligible amount of state, so no worries.


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Impala Public Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6312

to look at the new patch set (#7).

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................

IMPALA-4996: Single-threaded KuduScanNode

This introduces KuduScanNodeMt, the single-threaded version
of KuduScanNode that materializes the tuples in GetNext().
KuduScanNodeMt is enabled by the same condition as
HdfsScanNodeMt: mt_dop is greater than or equal to 1.

To share code between the two implementations, KuduScanNode
and KuduScanNodeMt are now subclasses of KuduScanNodeBase,
which implements the shared code. The KuduScanner is
minimally impacted, as it already had the required GetNext
interface.

Since the KuduClient is a heavy-weight object, it is now
shared at the QueryState level. We try to share the
KuduClient as much as possible, but there are times when
the KuduClient cannot be shared. Each Kudu table has
master addresses stored in the Hive Metastore. We only
share KuduClients for tables that have an identical value
for the master addresses. In the ideal case, every Kudu
table will have the same value, but there is no explicit
guarantee of this.

The testing for this is a modified version of
kudu-scan-node.test run with various mt_dop values.

Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
---
M be/src/exec/CMakeLists.txt
M be/src/exec/exec-node.cc
M be/src/exec/hdfs-scan-node-base.h
A be/src/exec/kudu-scan-node-base.cc
A be/src/exec/kudu-scan-node-base.h
A be/src/exec/kudu-scan-node-mt.cc
A be/src/exec/kudu-scan-node-mt.h
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-scan-node.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test
M tests/query_test/test_mt_dop.py
17 files changed, 563 insertions(+), 143 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 2:

(5 comments)

I started reviewing this and realized there are maybe some more changes you're still working on to share the KuduClient per-query, so I'll hold off on looking much more closely. Looks good so far though.

http://gerrit.cloudera.org:8080/#/c/6312/2//COMMIT_MSG
Commit Message:

Line 1: Parent:     def5355a (IMPALA-3403: [DOCS] Pare back irrelevant installation info)
nit: can you wrap lines in git commits at 60 chars? formatting in git tools prefer this

If you edit commit messages with git you can have it auto set this w/ this in your vimrc:

au FileType gitcommit set tw=60


http://gerrit.cloudera.org:8080/#/c/6312/2/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

PS2, Line 122: Base
remove Base for the debugging output


http://gerrit.cloudera.org:8080/#/c/6312/2/be/src/exec/kudu-scan-node.h
File be/src/exec/kudu-scan-node.h:

PS2, Line 79: 
            :   RuntimeProfile::Counter* kudu_round_trips_;
            :   RuntimeProfile::Counter* kudu_remote_tokens_;
            :   static const std::string KUDU_ROUND_TRIPS;
            :   static const std::string KUDU_REMOTE_TOKENS;
remove; aren't these in the base class?


http://gerrit.cloudera.org:8080/#/c/6312/2/tests/query_test/test_mt_dop.py
File tests/query_test/test_mt_dop.py:

PS2, Line 102: rlies
typo


PS2, Line 118: # Parquet filtering test rlies on a specific mt_dop value, so keep in its own test
update comment


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 2:

I had to rebase due to a different branch. This upload contains no new changes.

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No