You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2017/10/12 07:41:33 UTC

[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/19478

    [SPARK-22263][SQL]Refactor deterministic as lazy value

    ## What changes were proposed in this pull request?
    
    The method `deterministic` is frequently called in optimizer.
    Refactor `deterministic` as lazy value, in order to avoid redundant computations.
    
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark deterministicAsLazyVal

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19478.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19478
    
----
commit bdeea55e607ad16d292efeb4059249918a6961c1
Author: Wang Gengliang <lt...@gmail.com>
Date:   2017-10-12T06:00:09Z

    make deterministic lazy value to avoid redundant computation

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19478


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    **[Test build #82664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82664/testReport)** for PR 19478 at commit [`bdeea55`](https://github.com/apache/spark/commit/bdeea55e607ad16d292efeb4059249918a6961c1).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82664/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19478#discussion_r144230724
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -79,7 +79,9 @@ abstract class Expression extends TreeNode[Expression] {
        * An example would be `SparkPartitionID` that relies on the partition id returned by TaskContext.
        * By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true.
        */
    -  def deterministic: Boolean = children.forall(_.deterministic)
    +  lazy val deterministic: Boolean = isDeterministic
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    **[Test build #82698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82698/testReport)** for PR 19478 at commit [`03e88b0`](https://github.com/apache/spark/commit/03e88b0ce8fc6dbf87ea970ba58abc79ff9c2a7f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    **[Test build #82667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82667/testReport)** for PR 19478 at commit [`8143ee1`](https://github.com/apache/spark/commit/8143ee1a81c01de777c9063c07ca30e497f1e3d6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    @gengliangwang do you have any benchmark that shows that this is a performance bottleneck?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    @viirya @hvanhovell @gatorsmile 
    
    Thanks, I have attached the performance result in the description in this PR.
    Overall I don't see any downside of the code change.
    Also it is possible to take much more time to get the `deterministic` of UDF, while making it lazy value can avoid that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82698/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    **[Test build #82698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82698/testReport)** for PR 19478 at commit [`03e88b0`](https://github.com/apache/spark/commit/03e88b0ce8fc6dbf87ea970ba58abc79ff9c2a7f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Ok. The performance result looks good. LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    **[Test build #82667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82667/testReport)** for PR 19478 at commit [`8143ee1`](https://github.com/apache/spark/commit/8143ee1a81c01de777c9063c07ca30e497f1e3d6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19478: [SPARK-22263][SQL]Refactor deterministic as lazy ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19478#discussion_r144221518
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -79,7 +79,9 @@ abstract class Expression extends TreeNode[Expression] {
        * An example would be `SparkPartitionID` that relies on the partition id returned by TaskContext.
        * By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true.
        */
    -  def deterministic: Boolean = children.forall(_.deterministic)
    +  lazy val deterministic: Boolean = isDeterministic
    --- End diff --
    
    How much time this can save? But why won't just:
    
    ```scala
    lazy val deterministic: Boolean = children.forall(_.deterministic)
    ```
    
    I think it is equal to this change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Compare the total optimization time for the TPC-DS queries?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82667/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19478: [SPARK-22263][SQL]Refactor deterministic as lazy value

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19478
  
    **[Test build #82664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82664/testReport)** for PR 19478 at commit [`bdeea55`](https://github.com/apache/spark/commit/bdeea55e607ad16d292efeb4059249918a6961c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org