You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/08/14 09:35:00 UTC
[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/22102
[SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier
## What changes were proposed in this pull request?
The introduction of `AnalysisBarrier` prevented `FixNullability` to go through all the nodes. This introduced a bug, which can lead to wrong results, as the nullability of the output attributes of an outer join can be wrong.
The PR makes `FixNullability` going through the `AnalysisBarrier`s.
## How was this patch tested?
added UT
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-25051
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22102.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22102
----
commit 043a8ad82678f2b2f12397377890232d7884963d
Author: Marco Gaido <ma...@...>
Date: 2018-08-14T09:31:12Z
[SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22102
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22102
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22102
**[Test build #94739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94739/testReport)** for PR 22102 at commit [`043a8ad`](https://github.com/apache/spark/commit/043a8ad82678f2b2f12397377890232d7884963d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22102#discussion_r210040093
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2300,4 +2300,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
checkAnswer(aggPlusFilter1, aggPlusFilter2.collect())
}
}
+
+ test("SPARK-25051: fix nullabilities of outer join attributes doesn't stop on AnalysisBarrier") {
+ val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
+ val df2 = spark.range(3).selectExpr("id")
+ assert(df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).collect().length == 1)
--- End diff --
Thanks for the suggestion, will follow it in next PRs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...
Posted by databricks-david-lewis <gi...@git.apache.org>.
Github user databricks-david-lewis commented on a diff in the pull request:
https://github.com/apache/spark/pull/22102#discussion_r210051880
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1704,6 +1704,7 @@ class Analyzer(
def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
case p if !p.resolved => p // Skip unresolved nodes.
+ case ab: AnalysisBarrier => apply(ab.child)
--- End diff --
Is it okay to lose the AnalysisBarrier here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22102
@mgaido91 The PR is merged. Could you close it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22102#discussion_r210036342
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
@@ -2300,4 +2300,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
checkAnswer(aggPlusFilter1, aggPlusFilter2.collect())
}
}
+
+ test("SPARK-25051: fix nullabilities of outer join attributes doesn't stop on AnalysisBarrier") {
+ val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
+ val df2 = spark.range(3).selectExpr("id")
+ assert(df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).collect().length == 1)
--- End diff --
Just general suggestion. We should compare the results
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22102
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94739/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22102#discussion_r210053264
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -1704,6 +1704,7 @@ class Analyzer(
def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
case p if !p.resolved => p // Skip unresolved nodes.
+ case ab: AnalysisBarrier => apply(ab.child)
--- End diff --
AnalysisBarrier is a leaf node. We still need to apply this rule. A better fix is to fix the rule. However, for backporting the fix, the risk of this fix is low
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 closed the pull request at:
https://github.com/apache/spark/pull/22102
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22102
This might not be the last one. Let us backport the fix of @maryannxue ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22102
LGTM as a surgical fix for 2.3, thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22102
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2174/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/22102
cc @cloud-fan @gatorsmile @rxin @viirya
cc @jerryshao for 2.3 RC
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22102
**[Test build #94739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94739/testReport)** for PR 22102 at commit [`043a8ad`](https://github.com/apache/spark/commit/043a8ad82678f2b2f12397377890232d7884963d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22102
Let me confirm it and then will merge it to 2.3
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org