You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/12/16 19:31:06 UTC

[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/10333

    [SPARK-12054] [SQL] Consider nullability of expression in codegen

    This could simplify the generated code for expressions that is not nullable.
    
    This PR fix lots of bugs about nullability.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark skip_nullable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10333.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10333
    
----
commit 8956204211d90e4c47d4f85d7ddc49d0be553d42
Author: Davies Liu <da...@databricks.com>
Date:   2015-11-30T22:44:20Z

    consider nullable in codegen

commit ddca7e23cb98db72e6a89e02feb449622e68b77e
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-15T22:20:01Z

    remove GenerateProjection

commit 830425111b12414284fc139808c66f2ffd8ded87
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-15T22:57:34Z

    fix build

commit 7e1b5e02606c81c2f5fd043dbdc373fffc045c77
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-15T23:51:47Z

    Merge branch 'remove_generate_projection' into skip_nullable

commit e3cccda936a27bb6bfae1c28d2c0184ae75b7f6f
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-15T23:52:44Z

    fix bug

commit 2cca3a19245b84d8b3bb442c27ac51b9072214cc
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-15T23:53:31Z

    fix test

commit 7e300b74dca9c1e3ac1770990aa62aeb7faf6282
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-16T00:02:14Z

    Merge branch 'remove_generate_projection' into skip_nullable

commit 8aa8bb59c0ad0293f04f936902d7f04b4132492f
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-16T01:47:15Z

    fix nullable

commit f16006942e1d3b6e33a3befabc486ea5890a5f0f
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-16T18:21:11Z

    fix nullability

commit fd4c9458fd57bcebe5b8fd09e101c7a937131454
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-16T18:23:22Z

    Merge branch 'master' of github.com:apache/spark into skip_nullable

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165890908
  
    @rxin Just ran a simple query as
    ```
    sqlContext.range(1<<30).groupBy().sum().collect()
    ```
    
    After this commit, the runtime went to 46.8 from 49.2s, about 5% improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165632200
  
    **[Test build #2227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2227/consoleFull)** for PR 10333 at commit [`9adad17`](https://github.com/apache/spark/commit/9adad172f287d9b54187f6699d59adae4078c56a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165687876
  
    **[Test build #47986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47986/consoleFull)** for PR 10333 at commit [`9f7d763`](https://github.com/apache/spark/commit/9f7d7634b03f457d14d06363f855e3de0ae55980).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165703897
  
    **[Test build #47987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47987/consoleFull)** for PR 10333 at commit [`3b1e42f`](https://github.com/apache/spark/commit/3b1e42ffb3808c7d830cd998ba0132404641e65c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165204839
  
    cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165224558
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47843/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r48007038
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -473,8 +472,8 @@ class DataFrame private[sql](
             usingColumns.map(col => withPlan(joined.right).resolve(col))
           case FullOuter =>
             usingColumns.map { col =>
    -          val leftCol = withPlan(joined.left).resolve(col)
    -          val rightCol = withPlan(joined.right).resolve(col)
    +          val leftCol = withPlan(joined.left).resolve(col).toAttribute.withNullability(true)
    +          val rightCol = withPlan(joined.right).resolve(col).toAttribute.withNullability(true)
               Alias(Coalesce(Seq(leftCol, rightCol)), col)()
    --- End diff --
    
    @cloud-fan They must be both nullable since null can appear in both columns. It's just that they can't be both null within a single row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165703926
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47987/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165502832
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47895148
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -87,18 +87,21 @@ object Cast {
       private def resolvableNullability(from: Boolean, to: Boolean) = !from || to
     
       private def forceNullable(from: DataType, to: DataType) = (from, to) match {
    -    case (StringType, _: NumericType) => true
    -    case (StringType, TimestampType) => true
    -    case (DoubleType, TimestampType) => true
    -    case (FloatType, TimestampType) => true
    -    case (StringType, DateType) => true
    -    case (_: NumericType, DateType) => true
    -    case (BooleanType, DateType) => true
    -    case (DateType, _: NumericType) => true
    -    case (DateType, BooleanType) => true
    -    case (DoubleType, _: DecimalType) => true
    -    case (FloatType, _: DecimalType) => true
    -    case (_, DecimalType.Fixed(_, _)) => true // TODO: not all upcasts here can really give null
    +    case (NullType, _) => true
    +    case (_, _) if from == to => false
    +
    +    case (StringType, BinaryType) => false
    +    case (StringType, _) => true
    +
    +    case (FloatType | DoubleType, TimestampType) => true
    +    case (TimestampType, DateType) => false
    +    case (_, DateType) => true
    +    case (DateType, TimestampType) => false
    +    case (DateType, _) => true
    --- End diff --
    
    Should `DateType` to `StringType` be `false`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165230905
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47879128
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
    @@ -48,11 +48,13 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "left"),
    -      Row(1, 2, "1", null) :: Row(2, 3, "2", null) :: Row(3, 4, "3", null) :: Nil)
    +      Row(1, 2, "1", null, null, null) :: Row(2, 3, "2", null, null, null) ::
    +        Row(3, 4, "3", null, null, null) :: Nil)
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "right"),
    -      Row(null, null, null, 2) :: Row(null, null, null, 3) :: Row(null, null, null, 4) :: Nil)
    +      Row(null, null, null, 1, 2, "2") :: Row(null, null, null, 2, 3, "3") ::
    +        Row(null, null, null, 3, 4, "4") :: Nil)
    --- End diff --
    
    This bug will be fixed by https://github.com/apache/spark/pull/10353


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165224554
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10333


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r48442383
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -87,18 +87,21 @@ object Cast {
       private def resolvableNullability(from: Boolean, to: Boolean) = !from || to
     
       private def forceNullable(from: DataType, to: DataType) = (from, to) match {
    -    case (StringType, _: NumericType) => true
    -    case (StringType, TimestampType) => true
    -    case (DoubleType, TimestampType) => true
    -    case (FloatType, TimestampType) => true
    -    case (StringType, DateType) => true
    -    case (_: NumericType, DateType) => true
    -    case (BooleanType, DateType) => true
    -    case (DateType, _: NumericType) => true
    -    case (DateType, BooleanType) => true
    -    case (DoubleType, _: DecimalType) => true
    -    case (FloatType, _: DecimalType) => true
    -    case (_, DecimalType.Fixed(_, _)) => true // TODO: not all upcasts here can really give null
    +    case (NullType, _) => true
    +    case (_, _) if from == to => false
    +
    +    case (StringType, BinaryType) => false
    +    case (StringType, _) => true
    +
    +    case (FloatType | DoubleType, TimestampType) => true
    +    case (TimestampType, DateType) => false
    +    case (_, DateType) => true
    +    case (DateType, TimestampType) => false
    +    case (DateType, _) => true
    --- End diff --
    
    In the current ccde, there is no case for (DateType, StringType).
    I can send a PR if this is to be added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165502829
  
    **[Test build #47932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47932/consoleFull)** for PR 10333 at commit [`765f735`](https://github.com/apache/spark/commit/765f73579b3a64a906ce246ffa3408313ce58b95).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47838978
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    --- End diff --
    
    I think it's not necessary (in terms of performance). Compiler can do all these, but not sure how far Janino had achieved on constant folding.
    
    We don't need to do this for every expression, but since UnaryExpression/BinaryExpression/TernaryExpression are used by many, this changes may worse it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165693237
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47986/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47992468
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    +      s"""
    +        ${eval1.code}
    +        ${eval2.code}
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        $resultCode
    --- End diff --
    
    maybe we can add an assert: `assert(nullable || (children.forall(!_.nullable)))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165686016
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47985/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165233027
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47835/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165859310
  
    I'm merging this into master. There could be still some bugs about nullability, we could fix them later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165686014
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165699603
  
    **[Test build #47987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47987/consoleFull)** for PR 10333 at commit [`3b1e42f`](https://github.com/apache/spark/commit/3b1e42ffb3808c7d830cd998ba0132404641e65c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47943608
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    --- End diff --
    
    In addition to Janino the JIT might also do more constant folding etc, which makes it hard to tell unfortunately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r48047005
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -548,20 +566,31 @@ abstract class TernaryExpression extends Expression {
         f: (String, String, String) => String): String = {
         val evals = children.map(_.gen(ctx))
         val resultCode = f(evals(0).value, evals(1).value, evals(2).value)
    -    s"""
    -      ${evals(0).code}
    -      boolean ${ev.isNull} = true;
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${evals(0).isNull}) {
    -        ${evals(1).code}
    -        if (!${evals(1).isNull}) {
    -          ${evals(2).code}
    -          if (!${evals(2).isNull}) {
    -            ${ev.isNull} = false;  // resultCode could change nullability
    -            $resultCode
    +    if (nullable) {
    --- End diff --
    
    if a `TernaryExpression` is nullable, currently we will always generate 3 nested `if` branches. But we still have chance to remove some `if` branches if some children are non-nullable, how about doing this optimization based on children's nullability?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47994679
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -473,8 +472,8 @@ class DataFrame private[sql](
             usingColumns.map(col => withPlan(joined.right).resolve(col))
           case FullOuter =>
             usingColumns.map { col =>
    -          val leftCol = withPlan(joined.left).resolve(col)
    -          val rightCol = withPlan(joined.right).resolve(col)
    +          val leftCol = withPlan(joined.left).resolve(col).toAttribute.withNullability(true)
    +          val rightCol = withPlan(joined.right).resolve(col).toAttribute.withNullability(true)
               Alias(Coalesce(Seq(leftCol, rightCol)), col)()
    --- End diff --
    
    the `leftCol` and `rightCol` are all nullable, but can't be both null right?
    so I think the result should be non-nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165230907
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47851/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165742356
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165628708
  
    **[Test build #2227 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2227/consoleFull)** for PR 10333 at commit [`9adad17`](https://github.com/apache/spark/commit/9adad172f287d9b54187f6699d59adae4078c56a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47833170
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    --- End diff --
    
    Is this branch necessary? (not suggesting you change it) but does the nullable path collapse correctly if left and right are non nullable? What I mean is:
    
    if eval1.isNull and eval2.isNull is always just false, do we get the same behavior as this special casing from the compiler optimizations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165883130
  
    Do we have any performance numbers on this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165259648
  
    **[Test build #2222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2222/consoleFull)** for PR 10333 at commit [`e418358`](https://github.com/apache/spark/commit/e41835804b818724f8c28c12e9606b4d4052fe37).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165693233
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165232552
  
    **[Test build #2222 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2222/consoleFull)** for PR 10333 at commit [`e418358`](https://github.com/apache/spark/commit/e41835804b818724f8c28c12e9606b4d4052fe37).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165224493
  
    **[Test build #47843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47843/consoleFull)** for PR 10333 at commit [`88b2107`](https://github.com/apache/spark/commit/88b21072b7e645075a31546a06edd8d5ea4d5176).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165502344
  
    **[Test build #47932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47932/consoleFull)** for PR 10333 at commit [`765f735`](https://github.com/apache/spark/commit/765f73579b3a64a906ce246ffa3408313ce58b95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47995307
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    +      s"""
    +        ${eval1.code}
    +        ${eval2.code}
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        $resultCode
    --- End diff --
    
    Even `left` or `right` is nullable, the new code is still correct, if the old code is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47875761
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
    @@ -48,11 +48,13 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "left"),
    -      Row(1, 2, "1", null) :: Row(2, 3, "2", null) :: Row(3, 4, "3", null) :: Nil)
    +      Row(1, 2, "1", null, null, null) :: Row(2, 3, "2", null, null, null) ::
    +        Row(3, 4, "3", null, null, null) :: Nil)
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "right"),
    -      Row(null, null, null, 2) :: Row(null, null, null, 3) :: Row(null, null, null, 4) :: Nil)
    +      Row(null, null, null, 1, 2, "2") :: Row(null, null, null, 2, 3, "3") ::
    +        Row(null, null, null, 3, 4, "4") :: Nil)
    --- End diff --
    
    Will create seperate JIRA for it (and fix it)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165233024
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r48052069
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -548,20 +566,31 @@ abstract class TernaryExpression extends Expression {
         f: (String, String, String) => String): String = {
         val evals = children.map(_.gen(ctx))
         val resultCode = f(evals(0).value, evals(1).value, evals(2).value)
    -    s"""
    -      ${evals(0).code}
    -      boolean ${ev.isNull} = true;
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${evals(0).isNull}) {
    -        ${evals(1).code}
    -        if (!${evals(1).isNull}) {
    -          ${evals(2).code}
    -          if (!${evals(2).isNull}) {
    -            ${ev.isNull} = false;  // resultCode could change nullability
    -            $resultCode
    +    if (nullable) {
    --- End diff --
    
    We could, but it have too much combinations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165703924
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165615449
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47963/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165232881
  
    **[Test build #47835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47835/consoleFull)** for PR 10333 at commit [`fd4c945`](https://github.com/apache/spark/commit/fd4c9458fd57bcebe5b8fd09e101c7a937131454).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165205489
  
    **[Test build #47835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47835/consoleFull)** for PR 10333 at commit [`fd4c945`](https://github.com/apache/spark/commit/fd4c9458fd57bcebe5b8fd09e101c7a937131454).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165552597
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47994932
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -473,8 +472,8 @@ class DataFrame private[sql](
             usingColumns.map(col => withPlan(joined.right).resolve(col))
           case FullOuter =>
             usingColumns.map { col =>
    -          val leftCol = withPlan(joined.left).resolve(col)
    -          val rightCol = withPlan(joined.right).resolve(col)
    +          val leftCol = withPlan(joined.left).resolve(col).toAttribute.withNullability(true)
    +          val rightCol = withPlan(joined.right).resolve(col).toAttribute.withNullability(true)
               Alias(Coalesce(Seq(leftCol, rightCol)), col)()
    --- End diff --
    
    It's fine that the column is nullable but there is no `null` in it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165742357
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47993/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165742229
  
    **[Test build #47993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47993/consoleFull)** for PR 10333 at commit [`3cc4cdf`](https://github.com/apache/spark/commit/3cc4cdf3f32e84e9c3c78377c37b5e6b7eeadba1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47871420
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
    @@ -48,11 +48,13 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "left"),
    -      Row(1, 2, "1", null) :: Row(2, 3, "2", null) :: Row(3, 4, "3", null) :: Nil)
    +      Row(1, 2, "1", null, null, null) :: Row(2, 3, "2", null, null, null) ::
    +        Row(3, 4, "3", null, null, null) :: Nil)
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "right"),
    -      Row(null, null, null, 2) :: Row(null, null, null, 3) :: Row(null, null, null, 4) :: Nil)
    +      Row(null, null, null, 1, 2, "2") :: Row(null, null, null, 2, 3, "3") ::
    +        Row(null, null, null, 3, 4, "4") :: Nil)
    --- End diff --
    
    We probably shouldn't show join keys multiple times in the result set. For `LEFT/RIGHT JOIN USING` queries, both PostgreSQL and MySQL show join keys only once. ScalaDoc of this overloaded `DataFrame.join` method also has similar description:
    
    ```scala
    
      /**
       * Equi-join with another [[DataFrame]] using the given columns.
       *
       * Different from other join functions, the join columns will only appear once in the output,
       * i.e. similar to SQL's `JOIN USING` syntax.
       ...
       */
    ```
    
    The following example comes from [PostgreSQL docs][1] (section 7.2.1.1):
    
    ```sql
    CREATE TABLE t1 (num INT, name TEXT);
    INSERT INTO t1 VALUES (1, 'a');
    INSERT INTO t1 VALUES (2, 'b');
    INSERT INTO t1 VALUES (3, 'c');
    
    CREATE TABLE t2 (num INT, value TEXT);
    INSERT INTO T2 VALUES (1, 'xxx');
    INSERT INTO t2 VALUES (3, 'yyy');
    INSERT INTO t2 VALUES (5, 'zzz');
    
    SELECT * FROM t1 LEFT JOIN t2 USING (num);
    ```
    
    PostgreSQL results in:
    
    ```
     num | name | value
    -----+------+-------
       1 | a    | xxx
       2 | b    |
       3 | c    | yyy
    (3 rows)
    ```
    
    and MySQL results in:
    
    ```
    +------+------+-------+
    | num  | name | value |
    +------+------+-------+
    |    1 | a    | xxx   |
    |    2 | b    | NULL  |
    |    3 | c    | yyy   |
    +------+------+-------+
    3 rows in set (0.01 sec)
    ```
    
    [1]: http://www.postgresql.org/docs/9.4/static/queries-table-expressions.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165717428
  
    **[Test build #47993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47993/consoleFull)** for PR 10333 at commit [`3cc4cdf`](https://github.com/apache/spark/commit/3cc4cdf3f32e84e9c3c78377c37b5e6b7eeadba1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47991747
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    +      s"""
    +        ${eval1.code}
    +        ${eval2.code}
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        $resultCode
    --- End diff --
    
    are we assuming if a `BinaryExpression` is not nullable, its children are also not nullable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47875650
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
    @@ -48,11 +48,13 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "left"),
    -      Row(1, 2, "1", null) :: Row(2, 3, "2", null) :: Row(3, 4, "3", null) :: Nil)
    +      Row(1, 2, "1", null, null, null) :: Row(2, 3, "2", null, null, null) ::
    +        Row(3, 4, "3", null, null, null) :: Nil)
     
         checkAnswer(
           df.join(df2, Seq("int", "str"), "right"),
    -      Row(null, null, null, 2) :: Row(null, null, null, 3) :: Row(null, null, null, 4) :: Nil)
    +      Row(null, null, null, 1, 2, "2") :: Row(null, null, null, 2, 3, "3") ::
    +        Row(null, null, null, 3, 4, "4") :: Nil)
    --- End diff --
    
    But there does exist bugs other than the nullability issue in the original `DataFrame.join` method. For example it doesn't handle full outer join correctly. Using the same example tables mentioned above, the following `spark-shell` snippet
    
    ```scala
    import sqlContext._
    val t1 = table("t1")
    val t2 = table("t2")
    t1.join(t2, Seq("num"), "fullouter").show()
    ```
    
    produces wrong query result:
    
    ```
    +----+----+-----+
    | num|name|value|
    +----+----+-----+
    |   1|   a|  xxx|
    |   2|   b| null|
    |   3|   c|  yyy|
    |null|null|  zzz|
    +----+----+-----+
    ```
    
    Here's the result from PostgreSQL:
    
    ```
    postgres=# SELECT * FROM t1 FULL JOIN t2 USING (num);
    
     num | name | value
    -----+------+-------
       1 | a    | xxx
       2 | b    |
       3 | c    | yyy
       5 |      | zzz
    (4 rows)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47983645
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -87,18 +87,21 @@ object Cast {
       private def resolvableNullability(from: Boolean, to: Boolean) = !from || to
     
       private def forceNullable(from: DataType, to: DataType) = (from, to) match {
    -    case (StringType, _: NumericType) => true
    -    case (StringType, TimestampType) => true
    -    case (DoubleType, TimestampType) => true
    -    case (FloatType, TimestampType) => true
    -    case (StringType, DateType) => true
    -    case (_: NumericType, DateType) => true
    -    case (BooleanType, DateType) => true
    -    case (DateType, _: NumericType) => true
    -    case (DateType, BooleanType) => true
    -    case (DoubleType, _: DecimalType) => true
    -    case (FloatType, _: DecimalType) => true
    -    case (_, DecimalType.Fixed(_, _)) => true // TODO: not all upcasts here can really give null
    +    case (NullType, _) => true
    +    case (_, _) if from == to => false
    +
    +    case (StringType, BinaryType) => false
    +    case (StringType, _) => true
    +
    +    case (FloatType | DoubleType, TimestampType) => true
    +    case (TimestampType, DateType) => false
    +    case (_, DateType) => true
    +    case (DateType, TimestampType) => false
    +    case (DateType, _) => true
    --- End diff --
    
    Good catch, will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165615445
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165693100
  
    **[Test build #47986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47986/consoleFull)** for PR 10333 at commit [`9f7d763`](https://github.com/apache/spark/commit/9f7d7634b03f457d14d06363f855e3de0ae55980).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `abstract class ImperativeAggregate extends AggregateFunction with CodegenFallback `\n  * `case class UnresolvedWindowExpression(`\n  * `case class WindowExpression(`\n  * `case class Lead(input: Expression, offset: Expression, default: Expression)`\n  * `case class Lag(input: Expression, offset: Expression, default: Expression)`\n  * `abstract class AggregateWindowFunction extends DeclarativeAggregate with WindowFunction `\n  * `abstract class RowNumberLike extends AggregateWindowFunction `\n  * `trait SizeBasedWindowFunction extends AggregateWindowFunction `\n  * `case class RowNumber() extends RowNumberLike `\n  * `case class CumeDist() extends RowNumberLike with SizeBasedWindowFunction `\n  * `case class NTile(buckets: Expression) extends RowNumberLike with SizeBasedWindowFunction `\n  * `abstract class RankLike extends AggregateWindowFunction `\n  * `case class Rank(children: Seq[Expression]) extends Rank
 Like `\n  * `case class DenseRank(children: Seq[Expression]) extends RankLike `\n  * `case class PercentRank(children: Seq[Expression]) extends RankLike with SizeBasedWindowFunction `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r47992394
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -424,19 +431,30 @@ abstract class BinaryExpression extends Expression {
         val eval1 = left.gen(ctx)
         val eval2 = right.gen(ctx)
         val resultCode = f(eval1.value, eval2.value)
    -    s"""
    -      ${eval1.code}
    -      boolean ${ev.isNull} = ${eval1.isNull};
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      if (!${ev.isNull}) {
    -        ${eval2.code}
    -        if (!${eval2.isNull}) {
    -          $resultCode
    -        } else {
    -          ${ev.isNull} = true;
    +    if (nullable) {
    +      s"""
    +        ${eval1.code}
    +        boolean ${ev.isNull} = ${eval1.isNull};
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        if (!${ev.isNull}) {
    +          ${eval2.code}
    +          if (!${eval2.isNull}) {
    +            $resultCode
    +          } else {
    +            ${ev.isNull} = true;
    +          }
             }
    -      }
    -    """
    +      """
    +
    +    } else {
    +      ev.isNull = "false"
    +      s"""
    +        ${eval1.code}
    +        ${eval2.code}
    +        ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +        $resultCode
    --- End diff --
    
    I think we should forbid non-nullable `BinaryExpression` to call `nullSafeCodeGen` as it doesn't make sense(passing a `f` that supposed to only apply to not-null children, but actually it isn't.), and they should take care of null children themselves, i.e. override `genCode` directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10333#discussion_r48464673
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -87,18 +87,21 @@ object Cast {
       private def resolvableNullability(from: Boolean, to: Boolean) = !from || to
     
       private def forceNullable(from: DataType, to: DataType) = (from, to) match {
    -    case (StringType, _: NumericType) => true
    -    case (StringType, TimestampType) => true
    -    case (DoubleType, TimestampType) => true
    -    case (FloatType, TimestampType) => true
    -    case (StringType, DateType) => true
    -    case (_: NumericType, DateType) => true
    -    case (BooleanType, DateType) => true
    -    case (DateType, _: NumericType) => true
    -    case (DateType, BooleanType) => true
    -    case (DoubleType, _: DecimalType) => true
    -    case (FloatType, _: DecimalType) => true
    -    case (_, DecimalType.Fixed(_, _)) => true // TODO: not all upcasts here can really give null
    +    case (NullType, _) => true
    +    case (_, _) if from == to => false
    +
    +    case (StringType, BinaryType) => false
    +    case (StringType, _) => true
    +
    +    case (FloatType | DoubleType, TimestampType) => true
    +    case (TimestampType, DateType) => false
    +    case (_, DateType) => true
    +    case (DateType, TimestampType) => false
    +    case (DateType, _) => true
    --- End diff --
    
    It's covered by (_, StringType)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165216436
  
    **[Test build #47843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47843/consoleFull)** for PR 10333 at commit [`88b2107`](https://github.com/apache/spark/commit/88b21072b7e645075a31546a06edd8d5ea4d5176).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12054] [SQL] Consider nullability of ex...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10333#issuecomment-165502837
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47932/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org