You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/11/23 17:10:58 UTC

[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/19803

    [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport.consume

    ## What changes were proposed in this pull request?
    
    `ctx.currentVars` means the input variables for the current operator, which is already decided in `CodegenSupport`, we can set it there instead of `doConsume`.
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19803
    
----
commit cfae7b5c5747645c42d67c80f61cef7cb3b9423d
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-11-23T16:58:32Z

    set ctx.currentVars in CodegenSupport.consume

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152883562
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -56,9 +56,7 @@ case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan)
       }
     
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
    -    val exprs = projectList.map(x =>
    -      ExpressionCanonicalizer.execute(BindReferences.bindReference(x, child.output)))
    -    ctx.currentVars = input
    +    val exprs = projectList.map(x => BindReferences.bindReference[Expression](x, child.output))
    --- End diff --
    
    it doesn't matter, `Alias.genCode` is just a pass-through.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84146/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84146/testReport)** for PR 19803 at commit [`d674494`](https://github.com/apache/spark/commit/d67449428a107dc840bd801b820c0ec3d39f4241).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152849772
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
     
       /**
        * Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
    +   *
    +   * Note that `outputVars` and `row` can't both be null.
        */
       final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
         val inputVars =
    -      if (row != null) {
    +      if (outputVars != null) {
    +        assert(outputVars.length == output.length)
    +        // outputVars will be used to generate the code for UnsafeRow, so we should copy them
    +        outputVars.map(_.copy())
    +      } else {
    +        assert(row != null, "outputVars and row cannot both be null.")
             ctx.currentVars = null
             ctx.INPUT_ROW = row
             output.zipWithIndex.map { case (attr, i) =>
               BoundReference(i, attr.dataType, attr.nullable).genCode(ctx)
             }
    -      } else {
    --- End diff --
    
    I switched the order because it's more intuitive to prefer `outputVars` over `row` when generating `inputVars`. This actually doesn't change any behavior because there is no place providing both `outputVars` and `row`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152938354
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
     
       /**
        * Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
    +   *
    +   * Note that `outputVars` and `row` can't both be null.
        */
       final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
         val inputVars =
    -      if (row != null) {
    +      if (outputVars != null) {
    +        assert(outputVars.length == output.length)
    +        // outputVars will be used to generate the code for UnsafeRow, so we should copy them
    +        outputVars.map(_.copy())
    +      } else {
    +        assert(row != null, "outputVars and row cannot both be null.")
             ctx.currentVars = null
             ctx.INPUT_ROW = row
             output.zipWithIndex.map { case (attr, i) =>
               BoundReference(i, attr.dataType, attr.nullable).genCode(ctx)
             }
    -      } else {
    --- End diff --
    
    correction: `FileSourceScanExec` may provide both `outputVars` and `row`, but its `outputVars` is exactly what we generate here, so still no behavior change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84136/testReport)** for PR 19803 at commit [`cfae7b5`](https://github.com/apache/spark/commit/cfae7b5c5747645c42d67c80f61cef7cb3b9423d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Two comments, LGTM otherwise.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84137/testReport)** for PR 19803 at commit [`6758a34`](https://github.com/apache/spark/commit/6758a34aa6bd528a8fa8ab7d5db8964d1121f848).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84136/testReport)** for PR 19803 at commit [`cfae7b5`](https://github.com/apache/spark/commit/cfae7b5c5747645c42d67c80f61cef7cb3b9423d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r153009697
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -147,6 +149,11 @@ trait CodegenSupport extends SparkPlan {
           }
         }
     
    +    // Set up the `currentVars` in the codegen context, as we generate the code of `inputVars`
    +    // before calling `parent.doConsume`. We can't set up `INPUT_ROW`, because parent needs to
    +    // generate code of `rowVar` manually.
    +    ctx.currentVars = inputVars
    +    ctx.INPUT_ROW = null
    --- End diff --
    
    Will this change impact the chances of [`splitExpressions`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L780-L782)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84138/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84145/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152938438
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
     
       /**
        * Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
    +   *
    +   * Note that `outputVars` and `row` can't both be null.
        */
       final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
         val inputVars =
    -      if (row != null) {
    +      if (outputVars != null) {
    +        assert(outputVars.length == output.length)
    --- End diff --
    
    sorry I was wrong, see https://github.com/apache/spark/pull/19803/files#r152938354


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r153017609
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -147,6 +149,11 @@ trait CodegenSupport extends SparkPlan {
           }
         }
     
    +    // Set up the `currentVars` in the codegen context, as we generate the code of `inputVars`
    +    // before calling `parent.doConsume`. We can't set up `INPUT_ROW`, because parent needs to
    +    // generate code of `rowVar` manually.
    +    ctx.currentVars = inputVars
    +    ctx.INPUT_ROW = null
    --- End diff --
    
    `splitExpressions` doesn't work for whole stage codegen. Here I just made it more clear.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84146/testReport)** for PR 19803 at commit [`d674494`](https://github.com/apache/spark/commit/d67449428a107dc840bd801b820c0ec3d39f4241).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84138/testReport)** for PR 19803 at commit [`9ecc84d`](https://github.com/apache/spark/commit/9ecc84d7884a363eaaba0c3213ca6756b9475869).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152883588
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ---
    @@ -355,19 +355,12 @@ case class FileSourceScanExec(
         // PhysicalRDD always just has one input
         val input = ctx.freshName("input")
         ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
    -    val exprRows = output.zipWithIndex.map{ case (a, i) =>
    -      BoundReference(i, a.dataType, a.nullable)
    -    }
         val row = ctx.freshName("row")
    -    ctx.INPUT_ROW = row
    -    ctx.currentVars = null
    -    val columnsRowInput = exprRows.map(_.genCode(ctx))
    -    val inputRow = if (needsUnsafeRowConversion) null else row
    --- End diff --
    
    good catch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152881786
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val javaType = ctx.javaType(dataType)
    --- End diff --
    
    Also move this line to the else block. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84137/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19803


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84138 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84138/testReport)** for PR 19803 at commit [`9ecc84d`](https://github.com/apache/spark/commit/9ecc84d7884a363eaaba0c3213ca6756b9475869).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152882655
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ---
    @@ -355,19 +355,12 @@ case class FileSourceScanExec(
         // PhysicalRDD always just has one input
         val input = ctx.freshName("input")
         ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
    -    val exprRows = output.zipWithIndex.map{ case (a, i) =>
    -      BoundReference(i, a.dataType, a.nullable)
    -    }
         val row = ctx.freshName("row")
    -    ctx.INPUT_ROW = row
    -    ctx.currentVars = null
    -    val columnsRowInput = exprRows.map(_.genCode(ctx))
    -    val inputRow = if (needsUnsafeRowConversion) null else row
    --- End diff --
    
    When `needsUnsafeRowConversion` is true, previously we pass in a null and `consume` will generate code for a unsafe row projection. Now we always pass in the `row` which can possibly not a unsafe row. This looks changing behavior?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84137/testReport)** for PR 19803 at commit [`6758a34`](https://github.com/apache/spark/commit/6758a34aa6bd528a8fa8ab7d5db8964d1121f848).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    cc @hvanhovell @kiszk @viirya @gatorsmile  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84145/testReport)** for PR 19803 at commit [`7306d5d`](https://github.com/apache/spark/commit/7306d5d6a6fe88d3660b4197eea43f59d4124332).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152899229
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -108,20 +108,22 @@ trait CodegenSupport extends SparkPlan {
     
       /**
        * Consume the generated columns or row from current SparkPlan, call its parent's `doConsume()`.
    +   *
    +   * Note that `outputVars` and `row` can't both be null.
        */
       final def consume(ctx: CodegenContext, outputVars: Seq[ExprCode], row: String = null): String = {
         val inputVars =
    -      if (row != null) {
    +      if (outputVars != null) {
    +        assert(outputVars.length == output.length)
    --- End diff --
    
    Is it better to add `assert(row == null)` to ensure your comment below?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    **[Test build #84145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84145/testReport)** for PR 19803 at commit [`7306d5d`](https://github.com/apache/spark/commit/7306d5d6a6fe88d3660b4197eea43f59d4124332).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152881704
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val javaType = ctx.javaType(dataType)
    -    val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
         if (ctx.currentVars != null && ctx.currentVars(ordinal) != null) {
           val oev = ctx.currentVars(ordinal)
           ev.isNull = oev.isNull
           ev.value = oev.value
    -      val code = oev.code
    -      oev.code = ""
    -      ev.copy(code = code)
    --- End diff --
    
    Great, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152882876
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -56,9 +56,7 @@ case class ProjectExec(projectList: Seq[NamedExpression], child: SparkPlan)
       }
     
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
    -    val exprs = projectList.map(x =>
    -      ExpressionCanonicalizer.execute(BindReferences.bindReference(x, child.output)))
    -    ctx.currentVars = input
    +    val exprs = projectList.map(x => BindReferences.bindReference[Expression](x, child.output))
    --- End diff --
    
    This gets rid of `ExpressionCanonicalizer`. Isn't it possibly we have `Alias` in `projectList` here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152881282
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val javaType = ctx.javaType(dataType)
    -    val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
         if (ctx.currentVars != null && ctx.currentVars(ordinal) != null) {
           val oev = ctx.currentVars(ordinal)
           ev.isNull = oev.isNull
           ev.value = oev.value
    -      val code = oev.code
    -      oev.code = ""
    -      ev.copy(code = code)
    -    } else if (nullable) {
    -      ev.copy(code = s"""
    -        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    -        $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);""")
    +      ev.copy(code = oev.code)
         } else {
    -      ev.copy(code = s"""$javaType ${ev.value} = $value;""", isNull = "false")
    +      assert(ctx.INPUT_ROW != null)
    --- End diff --
    
    Add an assert message. 
    `assert(ctx.INPUT_ROW != null, "INPUT_ROW and currentVars cannot both be null.")`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19803: [SPARK-22596][SQL] set ctx.currentVars in CodegenSupport...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19803
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84136/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19803: [SPARK-22596][SQL] set ctx.currentVars in Codegen...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19803#discussion_r152848960
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -60,20 +60,23 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
     
       override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
         val javaType = ctx.javaType(dataType)
    -    val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
         if (ctx.currentVars != null && ctx.currentVars(ordinal) != null) {
           val oev = ctx.currentVars(ordinal)
           ev.isNull = oev.isNull
           ev.value = oev.value
    -      val code = oev.code
    -      oev.code = ""
    -      ev.copy(code = code)
    --- End diff --
    
    a small cleanup which is discussed in https://github.com/apache/spark/pull/11674#issuecomment-344955584


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org