You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/10/18 00:23:38 UTC
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/19523
[SPARK-22301][SQL] Add rule to Optimizer for In with empty list of va…
## What changes were proposed in this pull request?
For performance reason, we should resolve in operation on an empty list as false in the optimizations phase, ad discussed in #19522.
## How was this patch tested?
Added UT
cc @gatorsmile
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-22301
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19523.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19523
----
commit 9aa74dd65d76725415e4eaaf5452a90f62802a8d
Author: Marco Gaido <ma...@gmail.com>
Date: 2017-10-17T21:41:00Z
[SPARK-22301][SQL] Add rule to Optimizer for In with empty list of values
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19523
**[Test build #82928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82928/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19523
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19523#discussion_r145892306
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
@@ -204,6 +204,7 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
override def children: Seq[Expression] = value +: list
lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
+ lazy val isListEmpty = list.isEmpty
--- End diff --
Call list.isEmpty is, in comparison, fast and constant time. It doesn't save much of anything to cache it, and the overhead of a lazy val
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19523
Thanks! Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19523
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19523
**[Test build #82928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82928/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19523
**[Test build #82923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82923/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19523
**[Test build #83011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83011/testReport)** for PR 19523 at commit [`99df613`](https://github.com/apache/spark/commit/99df613b344868190de11499af50405b198706fa).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19523
**[Test build #83011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83011/testReport)** for PR 19523 at commit [`99df613`](https://github.com/apache/spark/commit/99df613b344868190de11499af50405b198706fa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19523#discussion_r145307826
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,7 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
+ // We rely on the optimizations in org.apache.spark.sql.catalyst.optimizer.OptimizeIn
+ // to be sure that the list cannot be empty
--- End diff --
IMHO this comment is not accurate, since in optimizer we only deal with the case attribute is not nullable.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19523
@mgaido91 Could you update the PR title?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82923/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19523#discussion_r145340100
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
@@ -204,6 +204,7 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
override def children: Seq[Expression] = value +: list
lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
+ lazy val isListEmpty = list.isEmpty
--- End diff --
I am using it to be consistent with the current implementation (see the line above)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83011/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19523
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19523#discussion_r145306880
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ---
@@ -204,6 +204,7 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
override def children: Seq[Expression] = value +: list
lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
+ lazy val isListEmpty = list.isEmpty
--- End diff --
Do we really need this val?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19523
**[Test build #82923 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82923/testReport)** for PR 19523 at commit [`50c7af3`](https://github.com/apache/spark/commit/50c7af3d4fb9a23ccf460a1842d7e57a26ca582c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19523#discussion_r145318964
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,7 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
+ // We rely on the optimizations in org.apache.spark.sql.catalyst.optimizer.OptimizeIn
+ // to be sure that the list cannot be empty
--- End diff --
We can remove this line after we merge this PR https://github.com/apache/spark/pull/19522
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with not...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19523: [SPARK-22301][SQL] Add rule to Optimizer for In with emp...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19523
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82928/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19523: [SPARK-22301][SQL] Add rule to Optimizer for In w...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19523
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org