You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/08/14 09:35:00 UTC

[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22102

    [SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier

    ## What changes were proposed in this pull request?
    
    The introduction of `AnalysisBarrier` prevented `FixNullability` to go through all the nodes. This introduced a bug, which can lead to wrong results, as the nullability of the output attributes of an outer join can be wrong.
    
    The PR makes `FixNullability` going through the `AnalysisBarrier`s.
    
    ## How was this patch tested?
    
    added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25051

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22102
    
----
commit 043a8ad82678f2b2f12397377890232d7884963d
Author: Marco Gaido <ma...@...>
Date:   2018-08-14T09:31:12Z

    [SPARK-25051][SQL] FixNullability should not stop on AnalysisBarrier

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    **[Test build #94739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94739/testReport)** for PR 22102 at commit [`043a8ad`](https://github.com/apache/spark/commit/043a8ad82678f2b2f12397377890232d7884963d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22102#discussion_r210040093
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2300,4 +2300,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           checkAnswer(aggPlusFilter1, aggPlusFilter2.collect())
         }
       }
    +
    +  test("SPARK-25051: fix nullabilities of outer join attributes doesn't stop on AnalysisBarrier") {
    +    val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
    +    val df2 = spark.range(3).selectExpr("id")
    +    assert(df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).collect().length == 1)
    --- End diff --
    
    Thanks for the suggestion, will follow it in next PRs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...

Posted by databricks-david-lewis <gi...@git.apache.org>.
Github user databricks-david-lewis commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22102#discussion_r210051880
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1704,6 +1704,7 @@ class Analyzer(
     
         def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
           case p if !p.resolved => p // Skip unresolved nodes.
    +      case ab: AnalysisBarrier => apply(ab.child)
    --- End diff --
    
    Is it okay to lose the AnalysisBarrier here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    @mgaido91 The PR is merged. Could you close it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22102#discussion_r210036342
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -2300,4 +2300,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
           checkAnswer(aggPlusFilter1, aggPlusFilter2.collect())
         }
       }
    +
    +  test("SPARK-25051: fix nullabilities of outer join attributes doesn't stop on AnalysisBarrier") {
    +    val df1 = spark.range(4).selectExpr("id", "cast(id as string) as name")
    +    val df2 = spark.range(3).selectExpr("id")
    +    assert(df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull).collect().length == 1)
    --- End diff --
    
    Just general suggestion. We should compare the results


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94739/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22102#discussion_r210053264
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1704,6 +1704,7 @@ class Analyzer(
     
         def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
           case p if !p.resolved => p // Skip unresolved nodes.
    +      case ab: AnalysisBarrier => apply(ab.child)
    --- End diff --
    
    AnalysisBarrier is a leaf node. We still need to apply this rule. A better fix is to fix the rule. However, for backporting the fix, the risk of this fix is low


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22102: [SPARK-25051][SQL] FixNullability should not stop...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 closed the pull request at:

    https://github.com/apache/spark/pull/22102


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    This might not be the last one. Let us backport the fix of @maryannxue ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    LGTM as a surgical fix for 2.3, thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2174/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    cc @cloud-fan @gatorsmile @rxin @viirya 
    
    cc @jerryshao for 2.3 RC


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    **[Test build #94739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94739/testReport)** for PR 22102 at commit [`043a8ad`](https://github.com/apache/spark/commit/043a8ad82678f2b2f12397377890232d7884963d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22102: [SPARK-25051][SQL] FixNullability should not stop on Ana...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22102
  
    Let me confirm it and then will merge it to 2.3 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org