You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by peter-toth <gi...@git.apache.org> on 2018/10/21 21:17:22 UTC
[GitHub] spark pull request #22789: [SPARK-25767][SQL] fix inputVars preparation if o...
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22789
[SPARK-25767][SQL] fix inputVars preparation if outputVars is a lazy stream
## What changes were proposed in this pull request?
Code generation is incorrect if `outputVars` parameter of `consume` method in `CodegenSupport` contains a lazy evaluated stream of expressions.
This PR fixes the issue by forcing the evaluation of `inputVars` before generating the code for UnsafeRow.
## How was this patch tested?
Tested with the sample program provided in https://issues.apache.org/jira/browse/SPARK-25767
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/peter-toth/spark SPARK-25767
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22789.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22789
----
commit 2ca1795da0af56021e7002bcae8ed8545eb72b6d
Author: Peter Toth <pe...@...>
Date: 2018-10-21T20:53:08Z
[SPARK-25767][SQL] fix inputVars preparation if outputVars is a lazy stream
Change-Id: Iac584a018f9892367841357c667ccaec1c15047b
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22789
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r227008114
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -146,7 +146,7 @@ trait CodegenSupport extends SparkPlan {
if (outputVars != null) {
assert(outputVars.length == output.length)
// outputVars will be used to generate the code for UnsafeRow, so we should copy them
- outputVars.map(_.copy())
+ outputVars.map(_.copy()).toBuffer
--- End diff --
can we do something like:
```
val res = outputVars.map(_.copy())
res match {
case x: Stream => x.force
case o => o
}
```
in order to avoid doing the `toBuffer` operation when not needed (please use more meaningful names than in my example in case it is feasible :) )?
cc @hvanhovell who I remember fixed a similar issue with `Stream`s.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22789
I don't have other comments @peter-toth. Maybe a committer can trigger a test for this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22789
**[Test build #98172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98172/testReport)** for PR 22789 at commit [`3608f74`](https://github.com/apache/spark/commit/3608f745f28b29e9f49977f6b3de5b7388c5321a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228753682
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
@@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext {
assert(df.limit(1).collect() === Array(Row("bat", 8.0)))
}
}
+
+ test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") {
--- End diff --
Strange, I've just rerun it on master and got:
```
08:34:45.754 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 81, Column 17: Expression "bhj_isNull_6" is not an rvalue
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 81, Column 17: Expression "bhj_isNull_6" is not an rvalue
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228760802
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
@@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext {
assert(df.limit(1).collect() === Array(Row("bat", 8.0)))
}
}
+
+ test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") {
--- End diff --
oh wait, I executed those commands in sparkShell and that falls back to interpreted mode when compilation fails. That is what happened here. Sorry about the fuss.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22789
**[Test build #98172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98172/testReport)** for PR 22789 at commit [`3608f74`](https://github.com/apache/spark/commit/3608f745f28b29e9f49977f6b3de5b7388c5321a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/22789
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22789
Thank you for submitting a PR. Would it be possible to add a test case, too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228753985
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -146,7 +146,10 @@ trait CodegenSupport extends SparkPlan {
if (outputVars != null) {
assert(outputVars.length == output.length)
// outputVars will be used to generate the code for UnsafeRow, so we should copy them
- outputVars.map(_.copy())
+ outputVars.map(_.copy()) match {
--- End diff --
What I found is that in `prepareRowVar` the `evaluateVariables` call clears the code of `outputVars`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228748979
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ---
@@ -319,4 +319,15 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext {
assert(df.limit(1).collect() === Array(Row("bat", 8.0)))
}
}
+
+ test("SPARK-25767: Lazy evaluated stream of expressions handled correctly") {
--- End diff --
Even without this patch this test would pass (I just tried it on master). A stream always evaluates its first element, so you probably need to add another key here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22789
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/22789
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22789
Thanks @mgaido91 @hvanhovell for the review.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/22789
Merged to master/2.4
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22789
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22789
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by peter-toth <gi...@git.apache.org>.
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22789
@kiszk , @mgaido91, @hvanhovell anything I can add to this PR?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228749168
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
@@ -146,7 +146,10 @@ trait CodegenSupport extends SparkPlan {
if (outputVars != null) {
assert(outputVars.length == output.length)
// outputVars will be used to generate the code for UnsafeRow, so we should copy them
- outputVars.map(_.copy())
+ outputVars.map(_.copy()) match {
--- End diff --
What in `consume` is relying on a side effect of traversing the `outputVars`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] Fix lazily evaluated stream of expres...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22789
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98172/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22789: [SPARK-25767][SQL] fix inputVars preparation if outputVa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22789
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org