You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2017/10/17 21:47:45 UTC
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/19522
[SPARK-22249][FOLLOWUP][SQL] Check if list of value for IN is empty in the optimizer
## What changes were proposed in this pull request?
This PR addresses the comments by @gatorsmile on [the previous PR](https://github.com/apache/spark/pull/19494).
## How was this patch tested?
Previous UT and added UT.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-22249_FOLLOWUP
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19522.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19522
----
commit 2327ad14b0a1f9b1a54098dc0d4ab105686ec8fa
Author: Marco Gaido <ma...@gmail.com>
Date: 2017-10-17T21:41:00Z
[SPARK-22249][FOLLOWUP][SQL] Check if list of value for IN is empty in the optimizer
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145269515
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
sorry, but I can't understand your suggestion: `Coalesce` returns the first non-null value. Here we should return `Null` when the value is null, `false` otherwise. I can't think of a function doing this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19522
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19522
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145268754
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
Use `Coalesce`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19522
**[Test build #82873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82873/testReport)** for PR 19522 at commit [`e95bc7b`](https://github.com/apache/spark/commit/e95bc7b395e027aa3d1e719d987b4f5a4461c34b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145269643
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,8 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
- case In(a: AttributeReference, list: Seq[Expression]) if list.forall(_.isInstanceOf[Literal]) =>
+ case In(a: AttributeReference, list: Seq[Expression])
--- End diff --
yes, it is used in the body of the `case`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19522
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82873/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145271936
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
BTW, we should submit a separate PR for this optimizer change.
We need to backport the fix to 2.2
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19522
**[Test build #3952 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3952/testReport)** for PR 19522 at commit [`e95bc7b`](https://github.com/apache/spark/commit/e95bc7b395e027aa3d1e719d987b4f5a4461c34b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145268935
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,8 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
- case In(a: AttributeReference, list: Seq[Expression]) if list.forall(_.isInstanceOf[Literal]) =>
+ case In(a: AttributeReference, list: Seq[Expression])
--- End diff --
Do we still need `a `?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145279067
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
Based on the SQL standard, the original fix is wrong. More importantly, the fix does not bring any noticeable perf improvement, because `buildFilter` is only used for partition pruning. In the future, we might enhance it for more advanced statistic-based filter inference. For example, foldable expressions can be evaluated earlier and this code change could cause a regression.
Yes. Please open a new JIRA for optimizer enhancement.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145271767
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
True. The current conversion does not help the perf. We just need to convert it to `false`, if we know the left side is not nullable.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19522
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19522
@gatorsmile , thanks, I updated the PR according to your comments. Now it should be ok. I am creating a new JIRA with for the changes to the optimizer. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19522
Thanks! Merged to master/2.2
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145273783
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
But if we don't change the plan here, then maybe it's worth to keep the initial change in the `buildFilters` to return false there without actually evaluating the filter itself, which is not needed in that case. What do you think?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19522
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19522
**[Test build #3952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3952/testReport)** for PR 19522 at commit [`e95bc7b`](https://github.com/apache/spark/commit/e95bc7b395e027aa3d1e719d987b4f5a4461c34b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145278393
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,8 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
- case In(a: AttributeReference, list: Seq[Expression]) if list.forall(_.isInstanceOf[Literal]) =>
+ case In(a: AttributeReference, list: Seq[Expression])
+ if list.forall(_.isInstanceOf[Literal]) && list.nonEmpty =>
--- End diff --
I will, as soon as we decide which is the right behavior, thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145274601
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
Should I also create a JIRA for the optimizer change then?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145270124
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,8 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
- case In(a: AttributeReference, list: Seq[Expression]) if list.forall(_.isInstanceOf[Literal]) =>
+ case In(a: AttributeReference, list: Seq[Expression])
+ if list.forall(_.isInstanceOf[Literal]) && list.nonEmpty =>
--- End diff --
Could we add a unit test case for `buildFilter`? You might need a new test suite here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145269604
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
+ If(IsNull(v), Literal.create(null, BooleanType), FalseLiteral)
--- End diff --
If `v` is not nullable, we should return false.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145267637
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala ---
@@ -102,7 +102,8 @@ case class InMemoryTableScanExec(
case IsNull(a: Attribute) => statsFor(a).nullCount > 0
case IsNotNull(a: Attribute) => statsFor(a).count - statsFor(a).nullCount > 0
- case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
+ // We rely on the optimizations in org.apache.spark.sql.catalyst.optimizer.OptimizeIn
--- End diff --
We should not rely on Optimizer for fixing the bugs.
We need to fix the line 107 anyway.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of val...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19522#discussion_r145267736
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
@@ -176,6 +176,8 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] {
object OptimizeIn extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
+ case expr @ In(v, _) if expr.isListEmpty =>
--- End diff --
Update the comment in the rule.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19522: [SPARK-22249][FOLLOWUP][SQL] Check if list of value for ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19522
**[Test build #82873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82873/testReport)** for PR 19522 at commit [`e95bc7b`](https://github.com/apache/spark/commit/e95bc7b395e027aa3d1e719d987b4f5a4461c34b).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org