You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2017/10/11 22:32:46 UTC

[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/19475

    [SPARK-22257][SQL]Reserve all non-deterministic expressions in ExpressionSet

    ## What changes were proposed in this pull request?
    
    For non-deterministic expressions, they should be considered as not contained in the [[ExpressionSet]].
    This is consistent with how we define `semanticEquals` between two expressions.
    Otherwise, combining expressions will remove non-deterministic expressions which should be reserved.
    E.g
    Combine filters of 
    ```scala
    testRelation.where(Rand(0) > 0.1).where(Rand(0) > 0.1)
    ```
    should result in
    ```scala
    testRelation.where(Rand(0) > 0.1 && Rand(0) > 0.1)
    ```
    
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark non-deterministic-expressionSet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19475
    
----
commit 262a0647da01f3e2edae6cb7ab9b66954a899067
Author: Wang Gengliang <lt...@gmail.com>
Date:   2017-10-11T21:01:54Z

    Reserve all non-deterministic expressions in ExpressionSet.

commit f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa
Author: Wang Gengliang <lt...@gmail.com>
Date:   2017-10-11T22:32:15Z

    revise comments

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    LGTM except one comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144184155
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    Sorry, for `Set` I mean the whole `ExpressionSet`.
    Please check the logic in `CombineFilters` and my new test case, you will understand me immediately.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144175468
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -46,14 +47,20 @@ object ExpressionSet {
      *   set.contains(1 + a) => true
      *   set.contains(a + 2) => false
      * }}}
    + *
    + * For non-deterministic expressions, they are always considered as not contained in the [[Set]].
    + * On adding a non-deterministic expression, simply append it to the original expressions.
    + * This is consistent with how we define `semanticEquals` between two expressions.
      */
     class ExpressionSet protected(
         protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
         protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
       extends Set[Expression] {
     
       protected def add(e: Expression): Unit = {
    -    if (!baseSet.contains(e.canonicalized)) {
    +    if (!e.deterministic) {
    +      originals += e
    +    } else if (!baseSet.contains(e.canonicalized) ) {
    --- End diff --
    
    Is it ok to just to replace `baseSet.contains(e.canonicalized)` ?:
    ```
        if (!baseSet.exists(_.semanticEquals(e))) {
          baseSet.add(e.canonicalized)
          originals += e
        }
    ```
    I feel better to put equality checks in one place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144186680
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    I am trying to be consistent with the behavior of original implementation.
    I think I had better override `--` for efficiency


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82655/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82655/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82648/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144182958
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -46,14 +47,20 @@ object ExpressionSet {
      *   set.contains(1 + a) => true
      *   set.contains(a + 2) => false
      * }}}
    + *
    + * For non-deterministic expressions, they are always considered as not contained in the [[Set]].
    + * On adding a non-deterministic expression, simply append it to the original expressions.
    + * This is consistent with how we define `semanticEquals` between two expressions.
      */
     class ExpressionSet protected(
         protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
         protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
       extends Set[Expression] {
     
       protected def add(e: Expression): Unit = {
    -    if (!baseSet.contains(e.canonicalized)) {
    +    if (!e.deterministic) {
    +      originals += e
    +    } else if (!baseSet.contains(e.canonicalized) ) {
    --- End diff --
    
    Yeah I have thought about this.
    But the time complexity is O(n) for adding an expression.
    It is a trade off, I prefer to my current implementation, the time complexity is O(1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82663/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82659/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82714/testReport)** for PR 19475 at commit [`60a0360`](https://github.com/apache/spark/commit/60a0360b946e9b9b3ee851d8ea5e68b498251d52).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82714/testReport)** for PR 19475 at commit [`60a0360`](https://github.com/apache/spark/commit/60a0360b946e9b9b3ee851d8ea5e68b498251d52).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82663/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82659/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144185928
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    If so, why you clone here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    @jiangxb1987 @gatorsmile @cloud-fan @maropu 
    Thanks for the comments. I have remove the irrelevant override.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144445349
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,8 +81,24 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    +    }
    +  }
    +
    +  override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
    --- End diff --
    
    me too, it seems only a minor optimization, to avoid cloning the `baseSet` and `originals` many times when removing multiple non-deterministic expressions. I think it's a premature optimization and let's remove it for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144293579
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,8 +81,24 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    +    }
    +  }
    +
    +  override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
    --- End diff --
    
    Could you explain why we should override `--`? I still don't quite get that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Thanks! Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82648/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144182675
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    No, it is not dropping it. Any non-deterministic `elem` is not considered as contained in the Set. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82655/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    The idea sounds good to me. Please add unit test cases for all the ExpressionSet APIs. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19475


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144450414
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,8 +81,24 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    +    }
    +  }
    +
    +  override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
    --- End diff --
    
    No need to add this in this PR. Please remove it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144172769
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    we need to drop non-deterministic `elem` from `originals`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82714/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144183735
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    `Set`? It seems `originals` is `ArrayBuffer`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82659/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144184258
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -74,9 +81,13 @@ class ExpressionSet protected(
       }
     
       override def -(elem: Expression): ExpressionSet = {
    -    val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    -    val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    -    new ExpressionSet(newBaseSet, newOriginals)
    +    if (elem.deterministic) {
    +      val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
    +      val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
    +      new ExpressionSet(newBaseSet, newOriginals)
    +    } else {
    +      new ExpressionSet(baseSet.clone(), originals.clone())
    --- End diff --
    
    There is no need to drop, since the non-deterministic `elem` is not in `originals`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82663/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19475#discussion_r144183653
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
    @@ -46,14 +47,20 @@ object ExpressionSet {
      *   set.contains(1 + a) => true
      *   set.contains(a + 2) => false
      * }}}
    + *
    + * For non-deterministic expressions, they are always considered as not contained in the [[Set]].
    + * On adding a non-deterministic expression, simply append it to the original expressions.
    + * This is consistent with how we define `semanticEquals` between two expressions.
      */
     class ExpressionSet protected(
         protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
         protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
       extends Set[Expression] {
     
       protected def add(e: Expression): Unit = {
    -    if (!baseSet.contains(e.canonicalized)) {
    +    if (!e.deterministic) {
    +      originals += e
    +    } else if (!baseSet.contains(e.canonicalized) ) {
    --- End diff --
    
    SGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19475
  
    **[Test build #82648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82648/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org