You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2017/10/11 22:32:46 UTC
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/19475
[SPARK-22257][SQL]Reserve all non-deterministic expressions in ExpressionSet
## What changes were proposed in this pull request?
For non-deterministic expressions, they should be considered as not contained in the [[ExpressionSet]].
This is consistent with how we define `semanticEquals` between two expressions.
Otherwise, combining expressions will remove non-deterministic expressions which should be reserved.
E.g
Combine filters of
```scala
testRelation.where(Rand(0) > 0.1).where(Rand(0) > 0.1)
```
should result in
```scala
testRelation.where(Rand(0) > 0.1 && Rand(0) > 0.1)
```
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark non-deterministic-expressionSet
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19475.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19475
----
commit 262a0647da01f3e2edae6cb7ab9b66954a899067
Author: Wang Gengliang <lt...@gmail.com>
Date: 2017-10-11T21:01:54Z
Reserve all non-deterministic expressions in ExpressionSet.
commit f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa
Author: Wang Gengliang <lt...@gmail.com>
Date: 2017-10-11T22:32:15Z
revise comments
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19475
LGTM except one comment
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144184155
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
Sorry, for `Set` I mean the whole `ExpressionSet`.
Please check the logic in `CombineFilters` and my new test case, you will understand me immediately.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144175468
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -46,14 +47,20 @@ object ExpressionSet {
* set.contains(1 + a) => true
* set.contains(a + 2) => false
* }}}
+ *
+ * For non-deterministic expressions, they are always considered as not contained in the [[Set]].
+ * On adding a non-deterministic expression, simply append it to the original expressions.
+ * This is consistent with how we define `semanticEquals` between two expressions.
*/
class ExpressionSet protected(
protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
extends Set[Expression] {
protected def add(e: Expression): Unit = {
- if (!baseSet.contains(e.canonicalized)) {
+ if (!e.deterministic) {
+ originals += e
+ } else if (!baseSet.contains(e.canonicalized) ) {
--- End diff --
Is it ok to just to replace `baseSet.contains(e.canonicalized)` ?:
```
if (!baseSet.exists(_.semanticEquals(e))) {
baseSet.add(e.canonicalized)
originals += e
}
```
I feel better to put equality checks in one place.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144186680
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
I am trying to be consistent with the behavior of original implementation.
I think I had better override `--` for efficiency
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82655/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82655/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82648/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144182958
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -46,14 +47,20 @@ object ExpressionSet {
* set.contains(1 + a) => true
* set.contains(a + 2) => false
* }}}
+ *
+ * For non-deterministic expressions, they are always considered as not contained in the [[Set]].
+ * On adding a non-deterministic expression, simply append it to the original expressions.
+ * This is consistent with how we define `semanticEquals` between two expressions.
*/
class ExpressionSet protected(
protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
extends Set[Expression] {
protected def add(e: Expression): Unit = {
- if (!baseSet.contains(e.canonicalized)) {
+ if (!e.deterministic) {
+ originals += e
+ } else if (!baseSet.contains(e.canonicalized) ) {
--- End diff --
Yeah I have thought about this.
But the time complexity is O(n) for adding an expression.
It is a trade off, I prefer to my current implementation, the time complexity is O(1).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82663/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82659/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82714/testReport)** for PR 19475 at commit [`60a0360`](https://github.com/apache/spark/commit/60a0360b946e9b9b3ee851d8ea5e68b498251d52).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82714/testReport)** for PR 19475 at commit [`60a0360`](https://github.com/apache/spark/commit/60a0360b946e9b9b3ee851d8ea5e68b498251d52).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82663/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82659/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144185928
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
If so, why you clone here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/19475
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/19475
@jiangxb1987 @gatorsmile @cloud-fan @maropu
Thanks for the comments. I have remove the irrelevant override.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144445349
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,8 +81,24 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
+ }
+ }
+
+ override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
--- End diff --
me too, it seems only a minor optimization, to avoid cloning the `baseSet` and `originals` many times when removing multiple non-deterministic expressions. I think it's a premature optimization and let's remove it for now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144293579
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,8 +81,24 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
+ }
+ }
+
+ override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
--- End diff --
Could you explain why we should override `--`? I still don't quite get that.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19475
Thanks! Merged to master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82648/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144182675
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
No, it is not dropping it. Any non-deterministic `elem` is not considered as contained in the Set.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/19475
Jenkins, retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82655/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19475
The idea sounds good to me. Please add unit test cases for all the ExpressionSet APIs. Thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19475
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144450414
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,8 +81,24 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
+ }
+ }
+
+ override def --(elems: GenTraversableOnce[Expression]): ExpressionSet = {
--- End diff --
No need to add this in this PR. Please remove it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144172769
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
we need to drop non-deterministic `elem` from `originals`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82714/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144183735
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
`Set`? It seems `originals` is `ArrayBuffer`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82659/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144184258
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -74,9 +81,13 @@ class ExpressionSet protected(
}
override def -(elem: Expression): ExpressionSet = {
- val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
- val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
- new ExpressionSet(newBaseSet, newOriginals)
+ if (elem.deterministic) {
+ val newBaseSet = baseSet.clone().filterNot(_ == elem.canonicalized)
+ val newOriginals = originals.clone().filterNot(_.canonicalized == elem.canonicalized)
+ new ExpressionSet(newBaseSet, newOriginals)
+ } else {
+ new ExpressionSet(baseSet.clone(), originals.clone())
--- End diff --
There is no need to drop, since the non-deterministic `elem` is not in `originals`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19475
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82663/testReport)** for PR 19475 at commit [`ce55412`](https://github.com/apache/spark/commit/ce5541260dac65ce09df374d240910f12099b3af).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19475: [SPARK-22257][SQL]Reserve all non-deterministic e...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19475#discussion_r144183653
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala ---
@@ -46,14 +47,20 @@ object ExpressionSet {
* set.contains(1 + a) => true
* set.contains(a + 2) => false
* }}}
+ *
+ * For non-deterministic expressions, they are always considered as not contained in the [[Set]].
+ * On adding a non-deterministic expression, simply append it to the original expressions.
+ * This is consistent with how we define `semanticEquals` between two expressions.
*/
class ExpressionSet protected(
protected val baseSet: mutable.Set[Expression] = new mutable.HashSet,
protected val originals: mutable.Buffer[Expression] = new ArrayBuffer)
extends Set[Expression] {
protected def add(e: Expression): Unit = {
- if (!baseSet.contains(e.canonicalized)) {
+ if (!e.deterministic) {
+ originals += e
+ } else if (!baseSet.contains(e.canonicalized) ) {
--- End diff --
SGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19475: [SPARK-22257][SQL]Reserve all non-deterministic expressi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19475
**[Test build #82648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82648/testReport)** for PR 19475 at commit [`f97fb98`](https://github.com/apache/spark/commit/f97fb9808fdeb2a9d46cd70105c7d05b876ad3fa).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org