You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org> on 2020/02/20 23:33:35 UTC

[Impala-ASF-CR] IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes

Bikramjeet Vig has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15259


Change subject: IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes
......................................................................

IMPALA-4080: Moved codegen code from scan exec nodes to their
plan nodes

This patch moves the code responsible for codegening code for
hdfs scanner to the plan node.

Testing:
Successfully ran exhaustive tests.
Also verified manually that codegen code is generated and
used for each hdfs scanner.

Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
14 files changed, 310 insertions(+), 237 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/15259/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5399/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 17:38:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Feb 2020 04:15:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@409
PS1, Line 409:   const HdfsScanPlanNode& const_plan_node = static_cast<const HdfsScanPlanNode&>(plan_node_);
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Feb 2020 23:34:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has removed a vote on this change.

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Removed Verified-1 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5294/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 00:22:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 10:21:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes
......................................................................


Patch Set 1:

(4 comments)

I did a quick scan, I've just a few nits.

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h
File be/src/exec/hdfs-avro-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h@98
PS1, Line 98:   static Status Codegen(
Does dropping WARN_UNUSED_RESULT has a reason? The same happens in many other places.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h@211
PS1, Line 211:   static Status CodegenDecodeAvroData(const HdfsScanPlanNode* node, RuntimeState* state,
Same as line 98.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h@237
PS1, Line 237: *
Why change this to a pointer? As far as I could see from the implementation, it cannot be null.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/runtime/query-state.h@249
PS1, Line 249:   vector<const TPlanFragmentInstanceCtx*> GetTPlanFragmentInstanceCtxs(
> Nit: return const & since they shouldn't be null
I'm not sure you can use references in a container.



-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 14:43:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 17:38:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 3: Code-Review+2

seems like a transient failure here HBaseTestDataRegionAssignment.performAssignment(HBaseTestDataRegionAssignment.java:145), restarting GVO


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 23:47:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 2:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/15259/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15259/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of
> weird wrapping
Done


http://gerrit.cloudera.org:8080/#/c/15259/1//COMMIT_MSG@10
PS1, Line 10: This patch moves the code responsible for invoking codegen for
> Is this sentence right? It seems like the code hasn't moved, but the plan n
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h
File be/src/exec/hdfs-avro-scanner.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h@98
PS1, Line 98:   static Status Codegen(
> Does dropping WARN_UNUSED_RESULT has a reason? The same happens in many oth
Tim pointed out earlier that we no longer need this since clang checks already catch dropped Status. So cleaning up code where i can


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h@211
PS1, Line 211:   static Status CodegenDecodeAvroData(const HdfsScanPlanNode* node, RuntimeState* state,
> Same as line 98.
see previous reply


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-avro-scanner.h@237
PS1, Line 237: &
> Why change this to a pointer? As far as I could see from the implementation
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@107
PS1, Line 107:   /// Returns index into materialized_slots with 'path'.  Returns SKIP_COLUMN if
> Consider removing this or making it a constant. I don't think the generalit
Removed it, here and in the scan exec node


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@133
PS1, Line 133: 
> Init to -1 just so any bugs are more obvious?
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@139
PS1, Line 139:   //
> FWIW I personally find this pattern of typedefing containers to make the co
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@165
PS1, Line 165: or all Hdfs s
> Maybe rename to something like 'scanned_file_formats_' to be more explicit.
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@a390
PS1, Line 390: 
> That was a weird place to put this...
moved this to HdfsScanNodeBase::IssueInitialScanRanges() instead


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@229
PS1, Line 229:       const THdfsFileSplit& split = scan_range_param.scan_range.hdfs_file_split;
> I guess this is not exactly equivalent to the previous logic if different i
Yup, all of this is in anticipation of doing this once per fragment, so it needs to make sure it includes all file types from ranges assigned to all instances.

I've switched it to only get the ranges from the current instance ctx and added a TODO for now.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@291
PS1, Line 291:   AddBytesReadCounters();
> Maybe remove this comment? Not sure it's useful.
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@409
PS1, Line 409: }
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@948
PS1, Line 948: eriali
> nit: no std:: needed
Done


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/runtime/query-state.h@249
PS1, Line 249:   static const int DEFAULT_BATCH_SIZE = 1024;
> Nit: return const & since they shouldn't be null
removed this part for now



-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Feb 2020 17:57:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5312/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Feb 2020 18:42:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080: Moved codegen code from scan exec nodes to their plan nodes
......................................................................


Patch Set 1:

(11 comments)

Thanks for making these patches so easy to review. I have minor comments only.

http://gerrit.cloudera.org:8080/#/c/15259/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15259/1//COMMIT_MSG@7
PS1, Line 7: IMPALA-4080: Moved codegen code from scan exec nodes to their
weird wrapping


http://gerrit.cloudera.org:8080/#/c/15259/1//COMMIT_MSG@10
PS1, Line 10: This patch moves the code responsible for codegening code for
Is this sentence right? It seems like the code hasn't moved, but the plan node is invoking codegen instead of the exec node.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@107
PS1, Line 107:   /// Returns the tuple idx into the row for this scan node to output to.
Consider removing this or making it a constant. I don't think the generality is likely to be ever needed.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@133
PS1, Line 133:   int tuple_id_;
Init to -1 just so any bugs are more obvious?


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@139
PS1, Line 139:   typedef boost::unordered_map<std::vector<int>, int> PathToSlotIdxMap;
FWIW I personally find this pattern of typedefing containers to make the code less readable in most cases because the actual type of the container is hidden. It was useful pre-c++11 to make iterator types less verbose.

You don't need to do anything but I wouldn't object if you removed the typedefs :)


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.h@165
PS1, Line 165: file_formats_
Maybe rename to something like 'scanned_file_formats_' to be more explicit.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@a390
PS1, Line 390: 
That was a weird place to put this...


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@229
PS1, Line 229:     auto ranges = ctx->per_node_scan_ranges.find(tnode.node_id);
I guess this is not exactly equivalent to the previous logic if different instances have different sets of file types. I don't think this matters at all once we have shared codegen, but just wanted to mention.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@291
PS1, Line 291: /// TODO: Break up this very long function.
Maybe remove this comment? Not sure it's useful.


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/exec/hdfs-scan-node-base.cc@948
PS1, Line 948: std::v
nit: no std:: needed


http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/15259/1/be/src/runtime/query-state.h@249
PS1, Line 249:   vector<const TPlanFragmentInstanceCtx*> GetTPlanFragmentInstanceCtxs(
Nit: return const & since they shouldn't be null



-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 21 Feb 2020 00:40:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 23:48:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5402/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 4
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 23:48:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15259

to look at the new patch set (#2).

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................

IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of
exec node

This patch moves the code responsible for invoking codegen for
hdfs scanner to the plan node.

Testing:
Successfully ran exhaustive tests.
Also verified manually that codegen code is generated and
used for each hdfs scanner.

Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
16 files changed, 294 insertions(+), 263 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/15259/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 16:13:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 2: Code-Review+1

I'll upgrade to +2 once we get Daniel's +1


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Feb 2020 19:50:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................

IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of
exec node

This patch moves the code responsible for invoking codegen for
hdfs scanner to the plan node.

Testing:
Successfully ran exhaustive tests.
Also verified manually that codegen code is generated and
used for each hdfs scanner.

Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Reviewed-on: http://gerrit.cloudera.org:8080/15259
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
16 files changed, 294 insertions(+), 263 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 5
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15259 )

Change subject: IMPALA-4080 [part 2]: Invoke codegen from scan's plan node instead of exec node
......................................................................


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5399/


-- 
To view, visit http://gerrit.cloudera.org:8080/15259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02c6183271b3731aa0d3cae2426be7269e462967
Gerrit-Change-Number: 15259
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Feb 2020 22:08:48 +0000
Gerrit-HasComments: No