You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2014/04/22 10:54:52 UTC

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/482

    [WIP][Spark-SQL] Optimize the Constant Folding for Expression

    Currently, expression does not support the "constant null" well in constant folding.
    e.g. Sum(a, null) actually always produces Literal(null, NumericType) in runtime.
    
    I changed the nullable interface from 
    def nullable: Boolean => def nullable: Nullability
    the Nullability has 3 concrete objects (neverNull, alwaysNull and possibleNull), which are very helpful hints in the constant folding in Optimizer.
    
    For example:
    ```
    explain select isnull(key+null)  from src;
    == Logical Plan ==
    Project [HiveGenericUdf#isnull((key#30 + CAST(null, IntegerType))) AS c_0#28]
     MetastoreRelation default, src, None
    
    == Optimized Logical Plan ==
    Project [true AS c_0#28]
     MetastoreRelation default, src, None
    
    == Physical Plan ==
    Project [true AS c_0#28]
     HiveTableScan [], (MetastoreRelation default, src, None), None
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark optimize_constant_folding

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/482.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #482
    
----
commit 787509bdcc6e7f23425b53a624fc92d9b22e8f24
Author: Cheng Hao <ha...@intel.com>
Date:   2014-04-22T06:01:40Z

    unify the nullable interface

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42377372
  
    Thanks for doing this :)
    
    @pwendell, can you please merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41203402
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41526003
  
    @marmbrus , can you please review this again? I've rewritten the code as rule-based.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12292598
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala ---
    @@ -40,23 +41,27 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression {
       override def toString = s"$child[$ordinal]"
     
       override def eval(input: Row): Any = {
    -    if (child.dataType.isInstanceOf[ArrayType]) {
    -      val baseValue = child.eval(input).asInstanceOf[Seq[_]]
    -      val o = ordinal.eval(input).asInstanceOf[Int]
    -      if (baseValue == null) {
    -        null
    -      } else if (o >= baseValue.size || o < 0) {
    -        null
    -      } else {
    -        baseValue(o)
    -      }
    +    val value = child.eval(input)
    +    if(value == null) {
    --- End diff --
    
    Space after `if`, below too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41746809
  
    I made some more comments about the correctness here.  I'm pretty worried about the fact that our tests did not catch any of these regressions.  I think we need to add a new set of tests before merging the PR.  I've sketched them out [here](https://github.com/chenghao-intel/spark/pull/1) and I made a PR against your branch.  In addition to this change, I think you should also create evaluation tests for each new optimization case.
    
    Testing is a little harder for aggregate expressions, so how about instead we add a HiveQueryTest for each of these.  That way we make sure we are obeying the correct semantics (i.e. `COUNT(null) => 0`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12036292
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
           // Skip redundant folding of literals.
           case l: Literal => l
    +      // if it's foldable
           case e if e.foldable => Literal(e.eval(null), e.dataType)
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
    --- End diff --
    
    Some of these already fold correctly.
    
    ```scala
    scala> sql("SELECT null IS NULL")
    res4: org.apache.spark.sql.SchemaRDD = 
    SchemaRDD[0] at RDD at SchemaRDD.scala:96
    == Query Plan ==
    Project [true AS c0#0]
    ```
    
    Maybe we should write tests for each case, before adding the rule, to make sure it is broken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AndreSchumacher <gi...@git.apache.org>.

Github user AndreSchumacher commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41203370
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41630184
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41203553
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41017640
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41864406
  
    Can you re-test this, it passed the unittest in my local, but the CI reports it timeouts or something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42257278
  
    Fixed. Can you retest this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/482#issuecomment-41753498

I think the problem with the previous implementation is the change is much larger and therefore harder to understand. (It took me a while to look at it and find bugs, but there were quite a few and I'm sure I didn't find them all.)

A lot of concepts _could_ be added to the expression hierarchy, but the goal of Catalyst is to instead allow self contained rules to perform optimizations. By keeping these optimization rules self contained it is much easier to look in one place and reason about their correctness. Making the rule simple while pushing all of the complex logic to many different places in the code is not the goal here.

The only reason nullability is modeled in the expression hierarchy at all is because it is a fairly fundamental concept that is part of the schema of the data.

Regarding HiveGenericUdf, I'm not sure how the three state solution allows us to do any extra folding. The UDF is still a black box and we have no idea how its going to respond to null inputs.

Regarding your last point about code generation, we don't need three states. We already know if we need to track a nullable bit when doing code generation with the current two state implementation. If it is possible statically determine a value to be null, that will have already been done by optimization rules and we will just have a null literal. All we need is a special rule that generates the AST for `Literal(null, _)`. So, actually I think the code generation logic is simpler with this implementation.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41630074
  
    test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12125319
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    --- End diff --
    
    I was thinking if put the literal matching in the beginning, maybe helpful avoid the further pattern matching of the rest rules. Just a tiny performance optimization for Literal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12265606
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    --- End diff --
    
    The same as the rule ConstantFolding, NullPropagation won't do any transformation for Literal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41752725
  
    @marmbrus I am wondering the previously implementation(by change the nullability states from 2 to 3) perhaps more convenient in some way.
    * Constant Folding rule is quite simple as "case e if e.nullable == alwaysNull => Literal(null, e.dataType)"
    * We may not able to enumerate the extended expression (HiveGenericUdf for example) in catalyst.
    * Nullability is a very helpful hint in the expression codegen:
    
    Nullability   | eval code | expression null indicator variable | expression variable
    ------------- | ------------- | --------------- | ---------------
    Possibly be null  | Yes | Yes | Yes
    Always be null  | No | Yes | No
    Never be null  | Yes | No | Yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12263035
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -114,37 +114,37 @@ package object dsl {
           def attr = analysis.UnresolvedAttribute(s)
     
           /** Creates a new AttributeReference of type boolean */
    -      def boolean = AttributeReference(s, BooleanType, nullable = false)()
    --- End diff --
    
    What is the rationale for changing all of these?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42153242
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41864569
  
    test this please.
    
    btw, you can ignore travis as it is broken now.  jenkins is the source of truth.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42155234
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12122588
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    --- End diff --
    
    There is no need to skip literals since none of the conditions below can ever match a raw literal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42257303
  
    test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12122966
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ IsNull(c @ Rand) => Literal(false, BooleanType)
    --- End diff --
    
    Is this more generally stated as `case IsNull(e) if e.nullable == false => false`?  Similarly below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12036349
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
           // Skip redundant folding of literals.
           case l: Literal => l
    +      // if it's foldable
           case e if e.foldable => Literal(e.eval(null), e.dataType)
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
    +      case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType)
    +      case e @ IsNull(c @ Rand) => Literal(false, BooleanType)
    +      case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType)
    +      case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType)
    +      case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType)
    +      case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ Coalesce(children) => {
    +        val newChildren = children.filter(c => c match {
    +          case Literal(null, _) => false
    +          case _ => true
    +        })
    +        if(newChildren.length == null) {
    +          Literal(null, e.dataType)
    +        } else if(newChildren.length == children.length){
    +          e
    +        } else {
    +          Coalesce(newChildren)
    +        }
    +      }
    +      case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue
    +      case e @ In(Literal(v, _), list) if(list.exists(c => c match {
    +          case Literal(candidate, _) if(candidate == v) => true
    +          case _ => false
    +        })) => Literal(true, BooleanType)
    +
    +      case e @ SortOrder(_, _) => e
    +      // put exceptional cases(Unary & Binary Expression) before here.
    +      case e: UnaryExpression => e.child match {
    --- End diff --
    
    I think its reasonable to enforce this nullability semantic on unary and binary nodes, but we should add something to their scaladoc.  Maybe also just make `SortOrder` not a UnaryNode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42120157
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42260314
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14692/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42257367
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41590618
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12038876
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
    --- End diff --
    
    `expression.foldable` is ok by traveling from top to bottom, while null propagation is opposite. I've put them into different rule objects (ConstantFolding & NullPropagation).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/482#discussion_r12265745

--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {

/**
* Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
+ * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with
+ * Null value propagation from bottom to top of the expression tree.
+ */
+object NullPropagation extends Rule[LogicalPlan] {
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case q: LogicalPlan => q transformExpressionsUp {
+ // Skip redundant folding of literals.
+ case l: Literal => l
--- End diff --

Yes, but in the case of `ConstantFolding` the subsequent pattern _will_ match `Literal`, since a `Literal` is technically `foldable`. Matching the next pattern causes the rule to invoke the expression evaluator and create an identical, wasted object.

In `NullPropogation`, a `Literal` will not match any of the later rules. So in essence you are second guessing the code generated by the pattern matcher. While there may be extreme cases where that is required for performance, I don't think this is one of them.

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41528129
  
    Thanks for working on this!  I think the rule based version is a lot clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12263240
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvaluationSuite.scala ---
    @@ -246,17 +250,19 @@ class ExpressionEvaluationSuite extends FunSuite {
     
         intercept[Exception] {evaluate(Literal(1) cast BinaryType, null)}
     
    -    assert(("abcdef" cast StringType).nullable === false)
    -    assert(("abcdef" cast BinaryType).nullable === false)
    -    assert(("abcdef" cast BooleanType).nullable === false)
    -    assert(("abcdef" cast TimestampType).nullable === true)
    -    assert(("abcdef" cast LongType).nullable === true)
    -    assert(("abcdef" cast IntegerType).nullable === true)
    -    assert(("abcdef" cast ShortType).nullable === true)
    -    assert(("abcdef" cast ByteType).nullable === true)
    -    assert(("abcdef" cast DecimalType).nullable === true)
    -    assert(("abcdef" cast DoubleType).nullable === true)
    -    assert(("abcdef" cast FloatType).nullable === true)
    +    checkEvaluation(("abcdef" cast StringType).nullable, false)
    --- End diff --
    
    I'm not sure you want to change these.  `nullable` returns a `Boolean`, not an `Expression`.  This code is only compiling because the `Boolean` values are getting implicitly converted to a `Literal`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12123835
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -239,6 +243,11 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression]
       self: Product =>
     }
     
    +/**
    + * Root class for rewritten single operand UDF expression. By default, we assume it produces Null 
    + * if its operand is null. Exceptional case requires to update the optimization rule 
    + * at [[optimizer.ConstantFolding ConstantFolding]]
    + */
     abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
    --- End diff --
    
    Same concerns as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42120168
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42120149
  
    Seems the test didn't restart.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41630181
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41203555
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14385/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42152975
  
    @marmbrus I have update the code as your comment, can you review/retest it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42120166
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41590107
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41235600
  
    I have 2 options for the changes:
    Option 1) is add an new method in expression, which tells the optimize rule, what it produces if part of its arguments are "always be null" / "possibly be null" / "never be null". the rule may test every expressions.
    Option 2) the rule itself enumerates all of the expressions, and understand what if part of the arguments is "always be null" etc., like the BooleanSimplification does.
    
    But still seems the "constraint propagation" is the best. Not sure if you have better idea on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41227747
  
    I'm not sure if we want to overload the nullability interface in this way.  This nullability bit currently is mostly intended for use when determining storage layout of tuples (i.e. do we need to keep a null bit around or can we operate on primitives).  What is going on here is something like constraint propagation + null domination, which we might be able to accomplish more generally.
    
    Also, I'm not sure if its a good idea to provide a default implementation of nullability as there are quite a few operators whose nullability isn't defined by their children (many aggregate operators for example).
    
    Perhaps there is a simpler way to do this using an improved rule in the optimizer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12263088
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,75 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    +      case e @ Count(Literal(null, _)) => Literal(0, e.dataType)
    +      case e @ Sum(Literal(c, _)) if(c == 0) => Literal(0, e.dataType)
    +      case e @ Average(Literal(c, _)) if(c == 0) => Literal(0.0, e.dataType)
    +      case e @ IsNull(c) if c.nullable == false => Literal(false, BooleanType)
    +      case e @ IsNotNull(c) if c.nullable == false => Literal(true, BooleanType)
    +      case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ Coalesce(children) => {
    +        val newChildren = children.filter(c => c match {
    +          case Literal(null, _) => false
    +          case _ => true
    +        })
    +        if(newChildren.length == 0) {
    +          Literal(null, e.dataType)
    +        } else if(newChildren.length == 1) {
    +          newChildren(0)
    +        } else {
    +          Coalesce(newChildren)
    +        }
    +      }
    +      case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue
    +      case e @ In(Literal(v, _), list) if(list.exists(c => c match {
    +          case Literal(candidate, _) if(candidate == v) => true
    +          case _ => false
    +        })) => Literal(true, BooleanType)
    +      case e: UnaryMinus => e.child match {
    +        case Literal(null, _) => Literal(null, e.dataType)
    +        case _ => e
    +      }
    +      case e: Cast => e.child match {
    +        case Literal(null, _) => Literal(null, e.dataType)
    +        case _ => e
    +      }
    +      case e: Not => e.child match {
    +        case Literal(null, _) => Literal(null, e.dataType)
    +        case _ => e
    +      }
    +      case e: And => e // leave it for BooleanSimplification
    +      case e: Or => e  // leave it for BooleanSimplification
    +      // Put exceptional cases above
    +      case e: BinaryArithmetic => e.children match {
    +        case Literal(null, _) :: right :: Nil => Literal(null, e.dataType)
    +        case left :: Literal(null, _) :: Nil => Literal(null, e.dataType)
    +        case _ => e
    +      }
    +      case e: BinaryPredicate => e.children match {
    --- End diff --
    
    If you match on `BinaryComparison` instead of `BinaryPredicate` you won't need to skip `And` and `Or` above, which seems a little clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41232857
  
    @marmbrus  I will review the nullability for aggregate operators etc. and it also may could be folded by statically analyze the nullability. (for example "sum(a)" but "a" could be an expression "Always Be Null").
    
    I was thinking not to extend the original nullability of expression, however only the "Never Be Null" and "Always Be Null" will be helpful in constant folding rule, but the boolean value of nullability is only represented as "Never Be Null" or "Possibly Be Null".
    
    I think we need to find some other way to show "Always Be Null" state for expression if we don't want to extend the nullability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41754201
  
    Understood. thank you @marmbrus , I will add more unit tests based on current implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12038808
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
           // Skip redundant folding of literals.
           case l: Literal => l
    +      // if it's foldable
           case e if e.foldable => Literal(e.eval(null), e.dataType)
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
    --- End diff --
    
    yes, correctly, if all of the operands are literal, and it's covered by the rule `expression.foldable` in most of cases. I removed the overlapped cases from the rule NullPropagation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42155236
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14652/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r11927603
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -244,3 +253,19 @@ abstract class UnaryExpression extends Expression with trees.UnaryNode[Expressio
     
       override def references = child.references
     }
    +
    +/**
    + * Expression value will fall into 3 categories:
    + * 1) possible be null
    + * 2) always be null
    + * 3) never be null
    + * This will be great helpful in constant folding for the expression evaluation
    + */
    +case class Nullability(value: Int)
    +
    +// possible be null (in runtime)
    +object possibleNull extends Nullability(0)
    +// always be null
    +object alwaysNull extends Nullability(-1)
    +// never be null
    +object neverNull extends Nullability(1)
    --- End diff --
    
    As a naming issue, objects like this should be capitalized (PossibleNull, AlwaysNull, NeverNull). It might also be good to rename PossibleNull to PossiblyNull.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41634040
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12265599
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -114,37 +114,37 @@ package object dsl {
           def attr = analysis.UnresolvedAttribute(s)
     
           /** Creates a new AttributeReference of type boolean */
    -      def boolean = AttributeReference(s, BooleanType, nullable = false)()
    --- End diff --
    
    Good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12122492
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -223,6 +222,11 @@ abstract class Expression extends TreeNode[Expression] {
       }
     }
     
    +/**
    + * Root class for rewritten 2 operands UDF expression. By default, we assume it produces Null if 
    + * either one of its operands is null. Exceptional case requires to update the optimization rule 
    + * at [[optimizer.ConstantFolding ConstantFolding]]
    + */
     abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression] {
    --- End diff --
    
    I know I said this was maybe an okay idea, but looking at this closer I'm worried that the semantics here are too subtle, even with the added documentation.  For example, I'm pretty sure we are introducing a bunch of bugs by making this existing class the marker for "null if children are null".  This is because `BinaryPredicate` inherits from `BinaryExpression` but predicates are not always null if one of their children is null.
    
    Instead maybe in the Rule should more explicitly name classes that can be null simplified.  For example, I think it is safe to do it on `BinaryArithmetic`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42397763
  
    I've merged this. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41526071
  
    @marmbrus , can you please review this again? I've rewritten the code as rule-based.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41203414
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12036270
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
    --- End diff --
    
    Do we want `Up`? I believe this means we are going to call evaluate on each foldable node working up instead of just calling it once at the top.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41590632
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42222148
  
    Other than a very minor style issue, this LGTM.
    
    @pwendell or @rxin please merge when ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42260313
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/482


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41590933
  
    Can you please fix the style issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12125494
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ---
    @@ -223,6 +222,11 @@ abstract class Expression extends TreeNode[Expression] {
       }
     }
     
    +/**
    + * Root class for rewritten 2 operands UDF expression. By default, we assume it produces Null if 
    + * either one of its operands is null. Exceptional case requires to update the optimization rule 
    + * at [[optimizer.ConstantFolding ConstantFolding]]
    + */
     abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression] {
    --- End diff --
    
    That's true, adding document restrictions for BinaryExpression / UnaryExpression is quite confusing and error-prone. Besides the BinaryArithmetic, also the rewrite UDFs RLike / Cast / BinaryComparison etc.will be considered, too.
    
    Probably enumerate all of the expression types in this rule makes more sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12263079
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    --- End diff --
    
    By that logic it would be an optimization to skip any class that won't match the cases below.  Why is Literal a special case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12122557
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,56 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    --- End diff --
    
    I think `COUNT(null)` actually evaluates to 0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12036250
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
           // Skip redundant folding of literals.
           case l: Literal => l
    +      // if it's foldable
    --- End diff --
    
    I think we can omit this comment since the line beneath reads `if e.foldable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42153187
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42121296
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14637/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41534729
  
    Thank you @marmbrus , I've updated the code a little bit.
    
    I've created another rule "NullPropagation", and it transforms the tree from bottom to top, and keep unchanged for the rule constant folding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41634043
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14546/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41236071
  
    I like the option of doing it similar to BooleanSimplification.  That seems nice and explicit.  I also think this rule will play nicely with more general constraint propagation when we add that in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41630045
  
    Done, can you retest it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42153247
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12131155
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -87,6 +88,75 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
     /**
      * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
    + * equivalent [[catalyst.expressions.Literal Literal]] values. This rule is more specific with 
    + * Null value propagation from bottom to top of the expression tree.
    + */
    +object NullPropagation extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case q: LogicalPlan => q transformExpressionsUp {
    +      // Skip redundant folding of literals.
    +      case l: Literal => l
    +      case e @ Count(Literal(null, _)) => Literal(0, e.dataType)
    +      case e @ Sum(Literal(c, _)) if(c == 0) => Literal(0, e.dataType)
    --- End diff --
    
    Not like the previous implementation, I didn't add the null value as Child expression for both Sum and Average in the rule, cause Hive will throw semantic exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12265587
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -114,37 +114,37 @@ package object dsl {
           def attr = analysis.UnresolvedAttribute(s)
     
           /** Creates a new AttributeReference of type boolean */
    -      def boolean = AttributeReference(s, BooleanType, nullable = false)()
    --- End diff --
    
    I think "def boolean" / "def string" should return an Attribute with nullability = true, that does not harm for the correctness, otherwise, it may bring a wrong hint for the optimization of rule NullPropagation.
    
    For example:
    val row = new GenericRow(Array[Any]("a", null))
    val c1 = 'a.string.at(0) 
    val c2 = 'b.string.at(1) // nullable should be true, otherwise does't reflect the real situation.
    assert(evaluate(IsNull(c2), row) == true)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42121295
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by chenghao-intel <gi...@git.apache.org>.

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12038713
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
           // Skip redundant folding of literals.
           case l: Literal => l
    +      // if it's foldable
           case e if e.foldable => Literal(e.eval(null), e.dataType)
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
    +      case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType)
    +      case e @ IsNull(c @ Rand) => Literal(false, BooleanType)
    +      case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType)
    +      case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType)
    +      case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType)
    +      case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ Coalesce(children) => {
    +        val newChildren = children.filter(c => c match {
    +          case Literal(null, _) => false
    +          case _ => true
    +        })
    +        if(newChildren.length == null) {
    +          Literal(null, e.dataType)
    +        } else if(newChildren.length == children.length){
    +          e
    +        } else {
    +          Coalesce(newChildren)
    +        }
    +      }
    +      case e @ If(Literal(v, _), trueValue, falseValue) => if(v == true) trueValue else falseValue
    +      case e @ In(Literal(v, _), list) if(list.exists(c => c match {
    +          case Literal(candidate, _) if(candidate == v) => true
    +          case _ => false
    +        })) => Literal(true, BooleanType)
    +
    +      case e @ SortOrder(_, _) => e
    +      // put exceptional cases(Unary & Binary Expression) before here.
    +      case e: UnaryExpression => e.child match {
    --- End diff --
    
    yes, I've update the SortOrder which inherits from UnaryNode instead of UnaryExpression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41590798
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14534/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-41590797
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/482#issuecomment-42257378
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][Spark-SQL] Optimize the Constant Folding...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/482#discussion_r12036482
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -91,10 +91,53 @@ object ColumnPruning extends Rule[LogicalPlan] {
      */
     object ConstantFolding extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -    case q: LogicalPlan => q transformExpressionsDown {
    +    case q: LogicalPlan => q transformExpressionsUp {
           // Skip redundant folding of literals.
           case l: Literal => l
    +      // if it's foldable
           case e if e.foldable => Literal(e.eval(null), e.dataType)
    +      case e @ Count(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Sum(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ Average(Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ IsNull(Literal(null, _)) => Literal(true, BooleanType)
    +      case e @ IsNull(Literal(_, _)) => Literal(false, BooleanType)
    +      case e @ IsNull(c @ Rand) => Literal(false, BooleanType)
    +      case e @ IsNotNull(Literal(null, _)) => Literal(false, BooleanType)
    +      case e @ IsNotNull(Literal(_, _)) => Literal(true, BooleanType)
    +      case e @ IsNotNull(c @ Rand) => Literal(true, BooleanType)
    +      case e @ GetItem(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ GetItem(_, Literal(null, _)) => Literal(null, e.dataType)
    +      case e @ GetField(Literal(null, _), _) => Literal(null, e.dataType)
    +      case e @ Coalesce(children) => {
    +        val newChildren = children.filter(c => c match {
    +          case Literal(null, _) => false
    +          case _ => true
    +        })
    +        if(newChildren.length == null) {
    --- End diff --
    
    Can `.length == null`?  Also, `transform` already does equality checking to minimize GC churn, so I think its okay to just return a filtered version of the children.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---