You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2017/10/12 07:41:33 UTC
[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/19478
[SPARK-22263][SQL]Refactor deterministic as lazy value
## What changes were proposed in this pull request?
The method `deterministic` is frequently called in optimizer.
Refactor `deterministic` as lazy value, in order to avoid redundant computations.
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark deterministicAsLazyVal
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19478.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19478
----
commit bdeea55e607ad16d292efeb4059249918a6961c1
Author: Wang Gengliang <lt...@gmail.com>
Date: 2017-10-12T06:00:09Z
make deterministic lazy value to avoid redundant computation
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19478
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19478
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19478
**[Test build #82664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82664/testReport)** for PR 19478 at commit [`bdeea55`](https://github.com/apache/spark/commit/bdeea55e607ad16d292efeb4059249918a6961c1).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19478
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82664/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:
https://github.com/apache/spark/pull/19478#discussion_r144230724
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
@@ -79,7 +79,9 @@ abstract class Expression extends TreeNode[Expression] {
* An example would be `SparkPartitionID` that relies on the partition id returned by TaskContext.
* By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true.
*/
- def deterministic: Boolean = children.forall(_.deterministic)
+ lazy val deterministic: Boolean = isDeterministic
--- End diff --
+1
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19478
**[Test build #82698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82698/testReport)** for PR 19478 at commit [`03e88b0`](https://github.com/apache/spark/commit/03e88b0ce8fc6dbf87ea970ba58abc79ff9c2a7f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19478
**[Test build #82667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82667/testReport)** for PR 19478 at commit [`8143ee1`](https://github.com/apache/spark/commit/8143ee1a81c01de777c9063c07ca30e497f1e3d6).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:
https://github.com/apache/spark/pull/19478
@gengliangwang do you have any benchmark that shows that this is a performance bottleneck?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19478
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/19478
@viirya @hvanhovell @gatorsmile
Thanks, I have attached the performance result in the description in this PR.
Overall I don't see any downside of the code change.
Also it is possible to take much more time to get the `deterministic` of UDF, while making it lazy value can avoid that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19478
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19478
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82698/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19478
**[Test build #82698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82698/testReport)** for PR 19478 at commit [`03e88b0`](https://github.com/apache/spark/commit/03e88b0ce8fc6dbf87ea970ba58abc79ff9c2a7f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19478
Ok. The performance result looks good. LGTM.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19478
**[Test build #82667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82667/testReport)** for PR 19478 at commit [`8143ee1`](https://github.com/apache/spark/commit/8143ee1a81c01de777c9063c07ca30e497f1e3d6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19478
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19478#discussion_r144221518
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
@@ -79,7 +79,9 @@ abstract class Expression extends TreeNode[Expression] {
* An example would be `SparkPartitionID` that relies on the partition id returned by TaskContext.
* By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true.
*/
- def deterministic: Boolean = children.forall(_.deterministic)
+ lazy val deterministic: Boolean = isDeterministic
--- End diff --
How much time this can save? But why won't just:
```scala
lazy val deterministic: Boolean = children.forall(_.deterministic)
```
I think it is equal to this change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19478
Compare the total optimization time for the TPC-DS queries?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19478
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82667/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19478
**[Test build #82664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82664/testReport)** for PR 19478 at commit [`bdeea55`](https://github.com/apache/spark/commit/bdeea55e607ad16d292efeb4059249918a6961c1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org