You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yhuai <gi...@git.apache.org> on 2016/05/25 06:19:23 UTC

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/13290

    [SQL] Prevent illegal NULL propagation when filtering outer-join results

    ## What changes were proposed in this pull request?
    This is another approach for addressing SPARK-13484 (the first approach is https://github.com/apache/spark/pull/11371). 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark SPARK-13484

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13290
    
----
commit 2dcecc8a934d6224d7680f5d25b7a98c53779d79
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-02-25T15:56:29Z

    Avoid illegal NULL propagation

commit 1ebfa80e8d52d959ffc98d191019b9e4242dd4a6
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-02-26T07:48:11Z

    Add comments

commit ab5d4f1174839abbd7f33cf587f75ae75cc76b68
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-02-27T15:16:54Z

    Add a new rule to solve illegal references

commit 8850cb3046a302afd012e4508bf8bfbd9f59051d
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-03-02T06:19:59Z

    Use foreach not map

commit b636b34bbe09f4152567b15b1fae30f76f7ffe55
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-04-15T02:17:01Z

    Solve illegal references in Projects

commit dfd952870d0b157dc8571ddee053a9fe91d372ed
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-04-15T04:28:38Z

    Add tests in DataFrameJoinSuite

commit 78ed4ef7a5d6c4565efd468cc48d6bfd1321f08b
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-04-15T08:44:34Z

    Fix test codes in ResolveNaturalJoinSuite

commit 3cec4cce8126faae615990c56f829368f306b55e
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-25T06:16:30Z

    Try to have a rule to fix nullability

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221777607
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59334/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221499915
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59261/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    **[Test build #59780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59780/consoleFull)** for PR 13290 at commit [`071b670`](https://github.com/apache/spark/commit/071b670303489238fda5944c1b10d0cc11277ca9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65278045
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    --- End diff --
    
    Then should we just put an `assert`/`require` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65282504
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r64687394
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,37 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case q: LogicalPlan if q.resolved =>
    +        val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExpirId, we need to resolve
    +            // the conflict of nullable field.
    +            val nullable = attributes.map(_.nullable).reduce(_ || _)
    --- End diff --
    
    I feel it is not very possible. Let me think about it more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r64532662
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,37 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case q: LogicalPlan if q.resolved =>
    +        val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExpirId, we need to resolve
    --- End diff --
    
    Nit: ExpirId -> ExprId


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221777604
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/13290
  
    LGTM except for a few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221493396
  
    **[Test build #59259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59259/consoleFull)** for PR 13290 at commit [`3cec4cc`](https://github.com/apache/spark/commit/3cec4cce8126faae615990c56f829368f306b55e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/13290
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290
  
    **[Test build #59658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59658/consoleFull)** for PR 13290 at commit [`127024d`](https://github.com/apache/spark/commit/127024da7e1058cd39b71e85c6dcd08b5e3e2b53).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221499914
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221483773
  
    **[Test build #59259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59259/consoleFull)** for PR 13290 at commit [`3cec4cc`](https://github.com/apache/spark/commit/3cec4cce8126faae615990c56f829368f306b55e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65278189
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    +            // the conflict of nullable field.
    +            val nullable = attributes.map(_.nullable).reduce(_ || _)
    --- End diff --
    
    Maybe `attributes.exist(_.nullable)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r64686215
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,37 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case q: LogicalPlan if q.resolved =>
    +        val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    --- End diff --
    
    Yea. This version tries to fix the nullability for the entire query plan tree because I think it is possible to hit this problem in other cases when using Dataset API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    @yhuai LGTM. Could you add 'Close #113711' in a commit log if this pr merged into master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221493543
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59259/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    **[Test build #59780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59780/consoleFull)** for PR 13290 at commit [`071b670`](https://github.com/apache/spark/commit/071b670303489238fda5944c1b10d0cc11277ca9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagat...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13290


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r64688437
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,37 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case q: LogicalPlan if q.resolved =>
    +        val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    --- End diff --
    
    yes, I got your point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221493542
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65263136
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    --- End diff --
    
    I think with our current implementation, it will not happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r64533746
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,37 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case q: LogicalPlan if q.resolved =>
    +        val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    --- End diff --
    
    I think the bug occurs in case of filters and projects just after outer-joins.
    Seems this fix could add more overheads since the rule overwrites nullability in every operator nodes.
    Or, can we ignore the overheads?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221499689
  
    **[Test build #59261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59261/consoleFull)** for PR 13290 at commit [`fff3382`](https://github.com/apache/spark/commit/fff33829192fd5c35339aeda80d392ae2bfd486c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59780/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290
  
    **[Test build #59658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59658/consoleFull)** for PR 13290 at commit [`127024d`](https://github.com/apache/spark/commit/127024da7e1058cd39b71e85c6dcd08b5e3e2b53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221777507
  
    **[Test build #59334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59334/consoleFull)** for PR 13290 at commit [`127024d`](https://github.com/apache/spark/commit/127024da7e1058cd39b71e85c6dcd08b5e3e2b53).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by maropu <gi...@git.apache.org>.

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r64532619
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,37 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case q: LogicalPlan if q.resolved =>
    +        val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExpirId, we need to resolve
    +            // the conflict of nullable field.
    +            val nullable = attributes.map(_.nullable).reduce(_ || _)
    --- End diff --
    
    Is this case is a possible state, output attributes with the same exprId and different nullability in children?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/13290
  
    Merging to master and branch-2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221484525
  
    **[Test build #59261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59261/consoleFull)** for PR 13290 at commit [`fff3382`](https://github.com/apache/spark/commit/fff33829192fd5c35339aeda80d392ae2bfd486c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13290: [SPARK-13484] [SQL] Prevent illegal NULL propagat...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65450769
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    --- End diff --
    
    I am not sure we should add assert. Even when we hit that case, it is still fine to pass at here, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59658/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65260408
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    --- End diff --
    
    When will this happen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when filtering ou...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13290
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13290#issuecomment-221768479
  
    **[Test build #59334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59334/consoleFull)** for PR 13290 at commit [`127024d`](https://github.com/apache/spark/commit/127024da7e1058cd39b71e85c6dcd08b5e3e2b53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13484] [SQL] Prevent illegal NULL propagation whe...

Posted by liancheng <gi...@git.apache.org>.

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13290#discussion_r65283090
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1448,6 +1450,38 @@ class Analyzer(
       }
     
       /**
    +   * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of
    +   * corresponding Attributes of its children output Attributes. This step is needed because
    +   * users can use a resolved AttributeReference in the Dataset API and outer joins
    +   * can change the nullability of an AttribtueReference. Without the fix, a nullable column's
    +   * nullable field can be actually set as non-nullable, which cause illegal optimization
    +   * (e.g., NULL propagation) and wrong answers.
    +   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
    +   */
    +  object FixNullability extends Rule[LogicalPlan] {
    +
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case p if !p.resolved => p // Skip unresolved nodes.
    +      case p: LogicalPlan if p.resolved =>
    +        val childrenOutput = p.children.flatMap(c => c.output).groupBy(_.exprId).flatMap {
    +          case (exprId, attributes) =>
    +            // If there are multiple Attributes having the same ExprId, we need to resolve
    +            // the conflict of nullable field.
    +            val nullable = attributes.map(_.nullable).reduce(_ || _)
    +            attributes.map(attr => attr.withNullability(nullable))
    +        }.toSeq
    +        val attributeMap = AttributeMap[Attribute](childrenOutput.map(attr => attr -> attr))
    +        // For an Attribute used by the current LogicalPlan, if it is from its children,
    +        // we fix the nullable field by using the nullability setting of the corresponding
    +        // output Attribute from the children.
    --- End diff --
    
    Maybe also explain that `AttributeMap` only compare attributes by expression ID. It's not quite intuitive why and how the attribute map helps here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org