You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2015/12/14 18:49:16 UTC

[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/10296

    [SPARK-12323][SQL] Makes BoundReference respect nullability

    This PR partially fixes SPARK-12323 by making `BoundReference` respect nullability. Now if a "top-level" column is defined as non-nullable, nulls appear in input data will cause a runtime exception with nice error message.
    
    What this PR doesn't fix is cases of non-nullable nested fields, which are covered in the 3 newly added ignored test cases. These test cases fail because `NewInstance`, `MakeObjects`, and potentially some other expressions along the way are always nullable. To fix these test cases, these expressions must also respect nullability.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-12323.non-nullable-ds-fields

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10296.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10296
    
----
commit de8b44292300e86f5f152dd7138555b94160d3c6
Author: Cheng Lian <li...@databricks.com>
Date:   2015-12-14T17:43:08Z

    Makes BoundReference respect nullability

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164608670
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47632111
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -499,10 +499,16 @@ class DataFrame private[sql](
         // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right
         // by creating a new instance for one of the branch.
         val joined = sqlContext.executePlan(
    -      Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join]
    +      Join(logicalPlan, right.logicalPlan, joinType = JoinType(joinType), None)
    +    ).analyzed.asInstanceOf[Join]
     
         // Project only one of the join columns.
    -    val joinedCols = usingColumns.map(col => withPlan(joined.right).resolve(col))
    +    //
    +    // SPARK-12336: For outer joins, attributes of at least one child plan output will be forced to
    +    // be nullable. An `AttributeSet` is necessary so that we are not affected by different
    +    // nullability values.
    +    val joinedCols = AttributeSet(usingColumns.map(col => withPlan(joined.right).resolve(col)))
    --- End diff --
    
    should we rename this to `joinedColsFromRight`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165062006
  
    **[Test build #47808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47808/consoleFull)** for PR 10296 at commit [`7e35e37`](https://github.com/apache/spark/commit/7e35e377303d038c8fd26c72fd062e7498ea49ea).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164758864
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47734/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164757890
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164754704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47730/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164612943
  
    All `ExtractValue` expressions should be fixed in similar way. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164975006
  
    **[Test build #47770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47770/consoleFull)** for PR 10296 at commit [`05c36e5`](https://github.com/apache/spark/commit/05c36e5cd1ec2c82a5d9fdf2c5750bc5a591b2d7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `case class WrapOption(child: Expression, optType: DataType)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164752248
  
    Hm, seems that this change reveals a lot of other existing nullability bugs...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47586012
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -69,10 +69,29 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
       override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
         val javaType = ctx.javaType(dataType)
         val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
    -    s"""
    -      boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    -      $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);
    -    """
    +
    +    if (nullable) {
    +      s"""
    +        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    +        $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);
    +       """
    +    } else {
    +      s"""
    +        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    +        $javaType ${ev.value};
    --- End diff --
    
    This should work under standard Java. Will double check how Janino behaves though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164758862
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164736614
  
    **[Test build #47730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47730/consoleFull)** for PR 10296 at commit [`43e5743`](https://github.com/apache/spark/commit/43e5743f24683a08cdc083f1df35c892cba4ad8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165042410
  
    **[Test build #47807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47807/consoleFull)** for PR 10296 at commit [`d540b86`](https://github.com/apache/spark/commit/d540b86c307666ff9bb73e8ab436fbd366243183).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47628941
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -498,28 +498,32 @@ class DataFrame private[sql](
       def join(right: DataFrame, usingColumns: Seq[String], joinType: String): DataFrame = {
         // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right
         // by creating a new instance for one of the branch.
    -    val joined = sqlContext.executePlan(
    +    val innerJoined = sqlContext.executePlan(
           Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join]
    --- End diff --
    
    can we set the corrected join type here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164754664
  
    **[Test build #47730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47730/consoleFull)** for PR 10296 at commit [`43e5743`](https://github.com/apache/spark/commit/43e5743f24683a08cdc083f1df35c892cba4ad8c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164751358
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47729/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165062118
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164614661
  
    **[Test build #47696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47696/consoleFull)** for PR 10296 at commit [`de8b442`](https://github.com/apache/spark/commit/de8b44292300e86f5f152dd7138555b94160d3c6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47631362
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -498,28 +498,32 @@ class DataFrame private[sql](
       def join(right: DataFrame, usingColumns: Seq[String], joinType: String): DataFrame = {
         // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right
         // by creating a new instance for one of the branch.
    -    val joined = sqlContext.executePlan(
    +    val innerJoined = sqlContext.executePlan(
           Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join]
    --- End diff --
    
    Good catch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164958118
  
    **[Test build #47770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47770/consoleFull)** for PR 10296 at commit [`05c36e5`](https://github.com/apache/spark/commit/05c36e5cd1ec2c82a5d9fdf2c5750bc5a591b2d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165067497
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47807/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164754702
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47753542
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -514,12 +520,7 @@ class DataFrame private[sql](
         withPlan {
           Project(
             joined.output.filterNot(joinedCols.contains(_)),
    -        Join(
    -          joined.left,
    -          joined.right,
    -          joinType = JoinType(joinType),
    -          condition)
    -      )
    +        joined.copy(condition = condition))
    --- End diff --
    
    Changes in this file fixes SPARK-12336.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47728708
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -69,10 +69,29 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
       override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
         val javaType = ctx.javaType(dataType)
         val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
    -    s"""
    -      boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    -      $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);
    -    """
    +
    +    if (nullable) {
    +      s"""
    +        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    +        $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);
    +       """
    +    } else {
    +      s"""
    +        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    --- End diff --
    
    I think ideally we would not do this check at all when `nullable = false`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng closed the pull request at:

    https://github.com/apache/spark/pull/10296


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165093099
  
    retest this please
    
    The last build failure seems to be irrelevant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164731425
  
    **[Test build #47727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47727/consoleFull)** for PR 10296 at commit [`fe11ed1`](https://github.com/apache/spark/commit/fe11ed1955706e29f0fb6128db5b19701b388432).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47623946
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -498,28 +498,28 @@ class DataFrame private[sql](
       def join(right: DataFrame, usingColumns: Seq[String], joinType: String): DataFrame = {
         // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right
         // by creating a new instance for one of the branch.
    -    val joined = sqlContext.executePlan(
    +    val innerJoined = sqlContext.executePlan(
           Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join]
     
         // Project only one of the join columns.
    -    val joinedCols = usingColumns.map(col => withPlan(joined.right).resolve(col))
    +    val joinedCols = AttributeSet(usingColumns.map(col => withPlan(innerJoined.right).resolve(col)))
    +
         val condition = usingColumns.map { col =>
           catalyst.expressions.EqualTo(
    -        withPlan(joined.left).resolve(col),
    -        withPlan(joined.right).resolve(col))
    +        withPlan(innerJoined.left).resolve(col),
    +        withPlan(innerJoined.right).resolve(col))
         }.reduceLeftOption[catalyst.expressions.BinaryExpression] { (cond, eqTo) =>
           catalyst.expressions.And(cond, eqTo)
         }
     
         withPlan {
    -      Project(
    -        joined.output.filterNot(joinedCols.contains(_)),
    -        Join(
    -          joined.left,
    -          joined.right,
    -          joinType = JoinType(joinType),
    -          condition)
    -      )
    +      val joined = Join(
    +        innerJoined.left,
    +        innerJoined.right,
    +        joinType = JoinType(joinType),
    +        condition)
    +
    +      Project(joined.output.filterNot(joinedCols.contains(_)), joined)
    --- End diff --
    
    This fixes SPARK-12336.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164960361
  
    @cloud-fan Yeah, that's the plan. Thanks for the review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164799755
  
    LGTM, can we fix the nested fields in a follow-up PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by 3ourroom <gi...@git.apache.org>.
Github user 3ourroom commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164731645
  
    
    NAVER - http://www.naver.com/
    --------------------------------------------
    
    3ourroom@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-12323][SPARK-12335][SPARK-12336][SQL] Makes BoundReference respect nullability (#10296)> 이 다음과 같은 이유로 전송 실패했습니다.
    
    --------------------------------------------
    
    받는 사람이 회원님의 메일을 수신차단 하였습니다. 
    
    
    --------------------------------------------



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165139153
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164736542
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47728/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164731457
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47727/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by 3ourroom <gi...@git.apache.org>.
Github user 3ourroom commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164778338
  
    
    NAVER - http://www.naver.com/
    --------------------------------------------
    
    3ourroom@naver.com 님께 보내신 메일 <Re: [spark] [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-12341][SPARK-12342][SQL] Makes BoundReference respect nullability (#10296)> 이 다음과 같은 이유로 전송 실패했습니다.
    
    --------------------------------------------
    
    받는 사람이 회원님의 메일을 수신차단 하였습니다. 
    
    
    --------------------------------------------



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47753570
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala ---
    @@ -42,7 +42,7 @@ case class DescribeCommand(
           new MetadataBuilder().putString("comment", "name of the column").build())(),
         AttributeReference("data_type", StringType, nullable = false,
           new MetadataBuilder().putString("comment", "data type of the column").build())(),
    -    AttributeReference("comment", StringType, nullable = false,
    +    AttributeReference("comment", StringType, nullable = true,
    --- End diff --
    
    This fixes SPARK-12341.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164757891
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47733/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164513147
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164975093
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47770/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164978701
  
    @liancheng I currently working on nullability of expressions, could you hold this PR a little bit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165052853
  
    **[Test build #47808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47808/consoleFull)** for PR 10296 at commit [`7e35e37`](https://github.com/apache/spark/commit/7e35e377303d038c8fd26c72fd062e7498ea49ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164787626
  
    **[Test build #47737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47737/consoleFull)** for PR 10296 at commit [`cfcc6df`](https://github.com/apache/spark/commit/cfcc6df2d9fbda7f044cbe8c9e2d6e29ee86d6a8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47632465
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -499,10 +499,16 @@ class DataFrame private[sql](
         // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right
         // by creating a new instance for one of the branch.
         val joined = sqlContext.executePlan(
    -      Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join]
    +      Join(logicalPlan, right.logicalPlan, joinType = JoinType(joinType), None)
    +    ).analyzed.asInstanceOf[Join]
     
         // Project only one of the join columns.
    -    val joinedCols = usingColumns.map(col => withPlan(joined.right).resolve(col))
    +    //
    +    // SPARK-12336: For outer joins, attributes of at least one child plan output will be forced to
    +    // be nullable. An `AttributeSet` is necessary so that we are not affected by different
    +    // nullability values.
    +    val joinedCols = AttributeSet(usingColumns.map(col => withPlan(joined.right).resolve(col)))
    --- End diff --
    
    I think it's OK. The first line of the comment above already explained the purpose of this variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165176419
  
    @marmbrus Reshaped this PR to only fix those nullability bugs.
    
    After some more investigation, now I don't think we can resolve SPARK-12323 by fixing `NewInstance`. The reasons are:
    
    1.  `ExpressionEncoder`s are always created using reflection based schema inference, which implies that the only non-nullable fields within a `fromRowExpression` are those of unboxed primitive types.
    1.  Unboxed primitive fields are always retrieved using code generated in `BoundReference` rathar than `NewInstance`, since `NewInstance` is only used to build objects.
    
    Since we would like to avoid per row runtime null checking and branching cost (what @davies and @nongli are working on), we'll have to assume the nullability of input data always match the schema of the `ExpressionEncoder` being used.  Another not quite appealing choice is to add an option to generate code with null checking, so that users can use it for debugging purposes.
    
    On the other hand, we can and should ensure nullability of the underlying logical plan is consistent with the Dataset while constructing a Dataset. For example, currently the following case works:
    
    ```scala
    val rowRDD = sqlContext.sparkContext.parallelize(Seq(Row("hello"), Row(null)))
    val schema = StructType(Seq(StructField("_1", StringType, nullable = false)))
    val df = sqlContext.createDataFrame(rowRDD, schema)
    df.as[Tuple1[String]].collect().foreach(println)
    
    // Output:
    //
    //   (hello)
    //   (null)
    ```
    
    This analysis time checking can be done in `ExpressionEncoder.resolve` by comparing schemata of the logical plan and the encoder. Opened PR #10331 for this check.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164787994
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47737/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165067495
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165067375
  
    **[Test build #47807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47807/consoleFull)** for PR 10296 at commit [`d540b86`](https://github.com/apache/spark/commit/d540b86c307666ff9bb73e8ab436fbd366243183).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `case class WrapOption(child: Expression, optType: DataType)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164614794
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47583946
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala ---
    @@ -69,10 +69,29 @@ case class BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)
       override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
         val javaType = ctx.javaType(dataType)
         val value = ctx.getValue(ctx.INPUT_ROW, dataType, ordinal.toString)
    -    s"""
    -      boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    -      $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);
    -    """
    +
    +    if (nullable) {
    +      s"""
    +        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    +        $javaType ${ev.value} = ${ev.isNull} ? ${ctx.defaultValue(dataType)} : ($value);
    +       """
    +    } else {
    +      s"""
    +        boolean ${ev.isNull} = ${ctx.INPUT_ROW}.isNullAt($ordinal);
    +        $javaType ${ev.value};
    --- End diff --
    
    Should we assign a default value for it? Or I'm afraid it can't compile...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164963594
  
    Hey guys, I think all this nullability clean up is great, but I afraid that the changes to `BoundReference` are in conflict with something @nongli and @davies are trying to accomplish.  Basically, for performance reasons we should be able to skip null checks for columns that are not nullable since bitset operations and branches are pretty expensive.  If we are trying to solve SPARK-12323 then I think the right place to do it is probably in `NewInstance`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164732417
  
    Didn't add separate test cases for SPARK-12335 and SPARK-12336 since they were caught by existing test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164614803
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47696/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164751354
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164734928
  
    **[Test build #47729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47729/consoleFull)** for PR 10296 at commit [`43e5743`](https://github.com/apache/spark/commit/43e5743f24683a08cdc083f1df35c892cba4ad8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47630916
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -498,28 +498,32 @@ class DataFrame private[sql](
       def join(right: DataFrame, usingColumns: Seq[String], joinType: String): DataFrame = {
         // Analyze the self join. The assumption is that the analyzer will disambiguate left vs right
         // by creating a new instance for one of the branch.
    -    val joined = sqlContext.executePlan(
    +    val innerJoined = sqlContext.executePlan(
           Join(logicalPlan, right.logicalPlan, joinType = Inner, None)).analyzed.asInstanceOf[Join]
    --- End diff --
    
    nvm, seems no need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47623928
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala ---
    @@ -53,7 +53,7 @@ abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate w
     
       override def children: Seq[Expression] = Seq(child)
     
    -  override def nullable: Boolean = false
    +  override def nullable: Boolean = true
    --- End diff --
    
    This fixes SPARK-12335.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SQL] Makes BoundRef...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164726083
  
    **[Test build #47727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47727/consoleFull)** for PR 10296 at commit [`fe11ed1`](https://github.com/apache/spark/commit/fe11ed1955706e29f0fb6128db5b19701b388432).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164609840
  
    For nested fields, how about improving the `GetStructField`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164757586
  
    **[Test build #47736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47736/consoleFull)** for PR 10296 at commit [`fa98ae6`](https://github.com/apache/spark/commit/fa98ae6f7bbbeea30fc5f4748bcef938d619b8c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-166355843
  
    @marmbrus Thanks a lot for the detailed explanation! According to your suggestion, I've added `AssertNotNull` in PR #10331.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164734297
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165720816
  
    Closing this one since PR #10333 already covers all the nullability bugs fixed in this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165062120
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47808/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164513148
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47656/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164762685
  
    **[Test build #47737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47737/consoleFull)** for PR 10296 at commit [`cfcc6df`](https://github.com/apache/spark/commit/cfcc6df2d9fbda7f044cbe8c9e2d6e29ee86d6a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164992983
  
    @davies As commented above, I'll reshape this PR to only fix those wrong nullability issues without touching `BoundReference`, so that it won't conflict with you changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164775719
  
    **[Test build #47736 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47736/consoleFull)** for PR 10296 at commit [`fa98ae6`](https://github.com/apache/spark/commit/fa98ae6f7bbbeea30fc5f4748bcef938d619b8c6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164775777
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164731456
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164751303
  
    **[Test build #47729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47729/consoleFull)** for PR 10296 at commit [`43e5743`](https://github.com/apache/spark/commit/43e5743f24683a08cdc083f1df35c892cba4ad8c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164611242
  
    **[Test build #47696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47696/consoleFull)** for PR 10296 at commit [`de8b442`](https://github.com/apache/spark/commit/de8b44292300e86f5f152dd7138555b94160d3c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47753602
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Corr.scala ---
    @@ -42,7 +41,7 @@ case class Corr(
     
       override def children: Seq[Expression] = Seq(left, right)
     
    -  override def nullable: Boolean = false
    +  override def nullable: Boolean = true
    --- End diff --
    
    This fixes SPARK-12342.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164967670
  
    @marmbrus I also found that `NewInstance` and `MapObjects` need to be updated for complete runtime nullability checking, especially for nested fields. As stated in the PR description, planning to fix this part in a follow-up PR.
    
    So how about updating this PR to only fix all the nullability mismatches without touching `BoundReference`, and then continue fixing `NewInstance` etc. in another one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10296#discussion_r47631480
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -518,8 +524,7 @@ class DataFrame private[sql](
               joined.left,
               joined.right,
               joinType = JoinType(joinType),
    -          condition)
    --- End diff --
    
    `joined.copy(condition = condition)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164775778
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47736/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164967881
  
    That sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SQL] M...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164736539
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12335][SPARK-12336][SPARK-12341][SPARK-...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-165211474
  
    > After some more investigation, now I don't think we can resolve SPARK-12323 by fixing NewInstance.
    
    You could augment `NewInstance` to understand which arguments are primitive types or you could infer this using reflection on the constructor.  Though I think the resulting expression tree would be clearer if you just created the following new expression and inserted it into the tree for primitive types that are the children of a `NewInstance`.
    
    ```scala
    case class AssertNotNull(path: String, child: Expression) ...
    ```
    
    I think there may be some confusion about the schema guarantees for encoders and their expressions.  When there is a primitive type, the corresponding `toRowExpressions` will be non-nullable, since the data is coming from an object that can't possible store a null value.  In contrast, the `fromRowExpression` are reading from arbitrary input data, and thus their nullablity has nothing to do with the structure of the target object.  Instead this information is coming from the schema that the encoder is resolved/bound to.  Therefore, using the `nullable` bit here is incorrect.  `nullable` should always be thought of as a promise about the input data that we can use for optimization, not a constraint enforcement mechanism.
    
    > Another not quite appealing choice is to add an option to generate code with null checking, so that users can use it for debugging purposes.
    
    I think it would actually be very good to have assertions that null data does not appear where it is not expected.  When we actually start using this information we are almost certainly going to find more places we are not propagating the information correctly.  However, we need to ensure that these are elided in production to avoid invalidating the optimization this information is supposed to enable.
    
    > This analysis time checking can be done in ExpressionEncoder.resolve by comparing schemata of the logical plan and the encoder.
    
    I don't agree that you can verify the problem with this code statically.  Creating a schema that says that `_1` is not nullable is valid, even though a String can be null in the JVM.  Again, this is a promise about the data itself, so you can't assert that there is a problem until that promise is violated and you see a null value in this column.
    
    That said, trusting the user to get this right has [led to confusion](https://issues.apache.org/jira/browse/SPARK-11319) in the past.  So I would propose that we do add validations at the `sqlContext.createDataX` boundaries.  Internally, though, we should trust this bit so that we can avoid unnecessary/expensive null checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164787990
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SQL] Makes BoundReference respec...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164508262
  
    cc @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12323][SPARK-12335][SPARK-12336][SPARK-...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10296#issuecomment-164975090
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org