You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by peter-toth <gi...@git.apache.org> on 2018/10/21 21:17:22 UTC

[GitHub] spark pull request #22789: [SPARK-25767][SQL] fix inputVars preparation if o...

GitHub user peter-toth opened a pull request:

    https://github.com/apache/spark/pull/22789

    [SPARK-25767][SQL] fix inputVars preparation if outputVars is a lazy stream

    ## What changes were proposed in this pull request?
    
    Code generation is incorrect if `outputVars` parameter of `consume` method in `CodegenSupport` contains a lazy evaluated stream of expressions.
    This PR fixes the issue by forcing the evaluation of `inputVars` before generating the code for UnsafeRow.
    
    ## How was this patch tested?
    
    Tested with the sample program provided in https://issues.apache.org/jira/browse/SPARK-25767 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/peter-toth/spark SPARK-25767

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22789.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22789
    
----
commit 2ca1795da0af56021e7002bcae8ed8545eb72b6d
Author: Peter Toth <pe...@...>
Date:   2018-10-21T20:53:08Z

    [SPARK-25767][SQL] fix inputVars preparation if outputVars is a lazy stream
    
    Change-Id: Iac584a018f9892367841357c667ccaec1c15047b

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22789


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22789#discussion_r227008114
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -146,7 +146,7 @@ trait CodegenSupport extends SparkPlan {
           if (outputVars != null) {
             assert(outputVars.length == output.length)
             // outputVars will be used to generate the code for UnsafeRow, so we should copy them
    -        outputVars.map(_.copy())
    +        outputVars.map(_.copy()).toBuffer
    --- End diff --
    
    can we do something like:
    ```
    val res = outputVars.map(_.copy())
    res  match {
     case x: Stream => x.force
     case o => o
    }
    ```
    
    in order to avoid doing the `toBuffer` operation when not needed (please use more meaningful names than in my example in case it is feasible :) )?
    
    cc @hvanhovell who I remember fixed a similar issue with `Stream`s.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    I don't have other comments @peter-toth. Maybe a committer can trigger a test for this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    **[Test build #98172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98172/testReport)** for PR 22789 at commit [`3608f74`](https://github.com/apache/spark/commit/3608f745f28b29e9f49977f6b3de5b7388c5321a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22789#discussion_r228753682
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext {
           assert(df.limit(1).collect() === Array(Row("bat", 8.0)))
         }
       }
    +
    +  test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") {
    --- End diff --
    
    Strange, I've just rerun it on master and got:
    ```
    08:34:45.754 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 81, Column 17: Expression "bhj_isNull_6" is not an rvalue
    org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 81, Column 17: Expression "bhj_isNull_6" is not an rvalue
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22789#discussion_r228760802
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext {
           assert(df.limit(1).collect() === Array(Row("bat", 8.0)))
         }
       }
    +
    +  test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") {
    --- End diff --
    
    oh wait, I executed those commands in sparkShell and that falls back to interpreted mode when compilation fails. That is what happened here. Sorry about the fuss.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    **[Test build #98172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98172/testReport)** for PR 22789 at commit [`3608f74`](https://github.com/apache/spark/commit/3608f745f28b29e9f49977f6b3de5b7388c5321a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Thank you for submitting a PR. Would it be possible to add a test case, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22789#discussion_r228753985
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -146,7 +146,10 @@ trait CodegenSupport extends SparkPlan {
           if (outputVars != null) {
             assert(outputVars.length == output.length)
             // outputVars will be used to generate the code for UnsafeRow, so we should copy them
    -        outputVars.map(_.copy())
    +        outputVars.map(_.copy()) match {
    --- End diff --
    
    What I found is that in `prepareRowVar` the `evaluateVariables` call clears the code of `outputVars`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22789#discussion_r228748979
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
    @@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext {
           assert(df.limit(1).collect() === Array(Row("bat", 8.0)))
         }
       }
    +
    +  test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") {
    --- End diff --
    
    Even without this patch this test would pass (I just tried it on master). A stream always evaluates its first element, so you probably need to add another key here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Thanks @mgaido91 @hvanhovell for the review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Merged to master/2.4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    @kiszk , @mgaido91, @hvanhovell anything I can add to this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22789#discussion_r228749168
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -146,7 +146,10 @@ trait CodegenSupport extends SparkPlan {
           if (outputVars != null) {
             assert(outputVars.length == output.length)
             // outputVars will be used to generate the code for UnsafeRow, so we should copy them
    -        outputVars.map(_.copy())
    +        outputVars.map(_.copy()) match {
    --- End diff --
    
    What in `consume` is relying on a side effect of traversing the `outputVars`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98172/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org