You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/11/23 17:10:58 UTC
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/19803
[SPARK-22596][SQL] set ctx.currentVars in CodegenSupport.consume
## What changes were proposed in this pull request?
`ctx.currentVars` means the input variables for the current operator, which is already decided in `CodegenSupport`, we can set it there instead of `doConsume`.
## How was this patch tested?
existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark codegen
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19803.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19803
----
commit cfae7b5c5747645c42d67c80f61cef7cb3b9423d
Author: Wenchen Fan <we...@databricks.com>
Date: 2017-11-23T16:58:32Z
set ctx.currentVars in CodegenSupport.consume
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152883562
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
@@ -56,9 +56,7 @@ case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan)
}
override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
- val exprs = projectList.map(x =>
- ExpressionCanonicalizer.execute(BindReferences.bindReference(x, child.output)))
- ctx.currentVars = input
+ val exprs = projectList.map(x => BindReferences.bindReference[Expression](x, child.output))
--- End diff --
it doesn't matter, `Alias.genCode` is just a pass-through.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84146/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84146/testReport)** for PR 19803 at commit [`d674494`](https://github.com/apache/spark/commit/d67449428a107dc840bd801b820c0ec3d39f4241).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152849772
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
/**
* Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
+ *
+ * Note that `outputVars` and `row` can't both be null.
*/
final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
val inputVars =
- if (row != null) {
+ if (outputVars != null) {
+ assert(outputVars.length == output.length)
+ // outputVars will be used to generate the code for UnsafeRow, so we should copy them
+ outputVars.map(_.copy())
+ } else {
+ assert(row != null, "outputVars and row cannot both be null.")
ctx.currentVars = null
ctx.INPUT_ROW = row
output.zipWithIndex.map { case (attr, i) =>
BoundReference(i, attr.dataType, attr.nullable).genCode(ctx)
}
- } else {
--- End diff --
I switched the order because it's more intuitive to prefer `outputVars` over `row` when generating `inputVars`. This actually doesn't change any behavior because there is no place providing both `outputVars` and `row`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152938354
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
/**
* Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
+ *
+ * Note that `outputVars` and `row` can't both be null.
*/
final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
val inputVars =
- if (row != null) {
+ if (outputVars != null) {
+ assert(outputVars.length == output.length)
+ // outputVars will be used to generate the code for UnsafeRow, so we should copy them
+ outputVars.map(_.copy())
+ } else {
+ assert(row != null, "outputVars and row cannot both be null.")
ctx.currentVars = null
ctx.INPUT_ROW = row
output.zipWithIndex.map { case (attr, i) =>
BoundReference(i, attr.dataType, attr.nullable).genCode(ctx)
}
- } else {
--- End diff --
correction: `FileSourceScanExec` may provide both `outputVars` and `row`, but its `outputVars` is exactly what we generate here, so still no behavior change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84136/testReport)** for PR 19803 at commit [`cfae7b5`](https://github.com/apache/spark/commit/cfae7b5c5747645c42d67c80f61cef7cb3b9423d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19803
Two comments, LGTM otherwise.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84137/testReport)** for PR 19803 at commit [`6758a34`](https://github.com/apache/spark/commit/6758a34aa6bd528a8fa8ab7d5db8964d1121f848).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84136/testReport)** for PR 19803 at commit [`cfae7b5`](https://github.com/apache/spark/commit/cfae7b5c5747645c42d67c80f61cef7cb3b9423d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19803
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r153009697
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -147,6 +149,11 @@ trait CodegenSupport extends SparkPlan {
}
}
+ // Set up the `currentVars` in the codegen context, as we generate the code of `inputVars`
+ // before calling `parent.doConsume`. We can't set up `INPUT_ROW`, because parent needs to
+ // generate code of `rowVar` manually.
+ ctx.currentVars = inputVars
+ ctx.INPUT_ROW = null
--- End diff --
Will this change impact the chances of [`splitExpressions`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L780-L782)?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84138/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84145/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152938438
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
/**
* Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
+ *
+ * Note that `outputVars` and `row` can't both be null.
*/
final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
val inputVars =
- if (row != null) {
+ if (outputVars != null) {
+ assert(outputVars.length == output.length)
--- End diff --
sorry I was wrong, see https://github.com/apache/spark/pull/19803/files#r152938354
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r153017609
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -147,6 +149,11 @@ trait CodegenSupport extends SparkPlan {
}
}
+ // Set up the `currentVars` in the codegen context, as we generate the code of `inputVars`
+ // before calling `parent.doConsume`. We can't set up `INPUT_ROW`, because parent needs to
+ // generate code of `rowVar` manually.
+ ctx.currentVars = inputVars
+ ctx.INPUT_ROW = null
--- End diff --
`splitExpressions` doesn't work for whole stage codegen. Here I just made it more clear.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84146/testReport)** for PR 19803 at commit [`d674494`](https://github.com/apache/spark/commit/d67449428a107dc840bd801b820c0ec3d39f4241).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84138/testReport)** for PR 19803 at commit [`9ecc84d`](https://github.com/apache/spark/commit/9ecc84d7884a363eaaba0c3213ca6756b9475869).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152883588
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ---
@@ -355,19 +355,12 @@ case class FileSourceScanExec(
// PhysicalRDD always just has one input
val input = ctx.freshName("input")
ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
- val exprRows = output.zipWithIndex.map{ case (a, i) =>
- BoundReference(i, a.dataType, a.nullable)
- }
val row = ctx.freshName("row")
- ctx.INPUT_ROW = row
- ctx.currentVars = null
- val columnsRowInput = exprRows.map(_.genCode(ctx))
- val inputRow = if (needsUnsafeRowConversion) null else row
--- End diff --
good catch!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152881786
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
@@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
--- End diff --
Also move this line to the else block.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84137/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19803
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84138/testReport)** for PR 19803 at commit [`9ecc84d`](https://github.com/apache/spark/commit/9ecc84d7884a363eaaba0c3213ca6756b9475869).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152882655
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ---
@@ -355,19 +355,12 @@ case class FileSourceScanExec(
// PhysicalRDD always just has one input
val input = ctx.freshName("input")
ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
- val exprRows = output.zipWithIndex.map{ case (a, i) =>
- BoundReference(i, a.dataType, a.nullable)
- }
val row = ctx.freshName("row")
- ctx.INPUT_ROW = row
- ctx.currentVars = null
- val columnsRowInput = exprRows.map(_.genCode(ctx))
- val inputRow = if (needsUnsafeRowConversion) null else row
--- End diff --
When `needsUnsafeRowConversion` is true, previously we pass in a null and `consume` will generate code for a unsafe row projection. Now we always pass in the `row` which can possibly not a unsafe row. This looks changing behavior?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84137/testReport)** for PR 19803 at commit [`6758a34`](https://github.com/apache/spark/commit/6758a34aa6bd528a8fa8ab7d5db8964d1121f848).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19803
cc @hvanhovell @kiszk @viirya @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84145/testReport)** for PR 19803 at commit [`7306d5d`](https://github.com/apache/spark/commit/7306d5d6a6fe88d3660b4197eea43f59d4124332).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152899229
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
/**
* Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
+ *
+ * Note that `outputVars` and `row` can't both be null.
*/
final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
val inputVars =
- if (row != null) {
+ if (outputVars != null) {
+ assert(outputVars.length == output.length)
--- End diff --
Is it better to add `assert(row == null)` to ensure your comment below?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19803
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19803
**[Test build #84145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84145/testReport)** for PR 19803 at commit [`7306d5d`](https://github.com/apache/spark/commit/7306d5d6a6fe88d3660b4197eea43f59d4124332).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152881704
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
@@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
- val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
if (ctx.currentVars != null && ctx.currentVars(ordinal) != null) {
val oev = ctx.currentVars(ordinal)
ev.isNull = oev.isNull
ev.value = oev.value
- val code = oev.code
- oev.code = ""
- ev.copy(code = code)
--- End diff --
Great, thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152882876
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
@@ -56,9 +56,7 @@ case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan)
}
override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
- val exprs = projectList.map(x =>
- ExpressionCanonicalizer.execute(BindReferences.bindReference(x, child.output)))
- ctx.currentVars = input
+ val exprs = projectList.map(x => BindReferences.bindReference[Expression](x, child.output))
--- End diff --
This gets rid of `ExpressionCanonicalizer`. Isn't it possibly we have `Alias` in `projectList` here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152881282
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
@@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
- val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
if (ctx.currentVars != null && ctx.currentVars(ordinal) != null) {
val oev = ctx.currentVars(ordinal)
ev.isNull = oev.isNull
ev.value = oev.value
- val code = oev.code
- oev.code = ""
- ev.copy(code = code)
- } else if (nullable) {
- ev.copy(code = s"""
- boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
- $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);""")
+ ev.copy(code = oev.code)
} else {
- ev.copy(code = s"""$javaType ${ev.value} = $value;""", isNull = "false")
+ assert(ctx.INPUT_ROW != null)
--- End diff --
Add an assert message.
`assert(ctx.INPUT_ROW != null, "INPUT_ROW and currentVars cannot both be null.")`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19803
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84136/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19803#discussion_r152848960
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
@@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
val javaType = ctx.javaType(dataType)
- val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
if (ctx.currentVars != null && ctx.currentVars(ordinal) != null) {
val oev = ctx.currentVars(ordinal)
ev.isNull = oev.isNull
ev.value = oev.value
- val code = oev.code
- oev.code = ""
- ev.copy(code = code)
--- End diff --
a small cleanup which is discussed in https://github.com/apache/spark/pull/11674#issuecomment-344955584
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org