You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/01/20 12:42:03 UTC

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/16659

    [SPARK-19309][SQL] disable common subexpression elimination for conditional expressions

    ## What changes were proposed in this pull request?
    
    As I pointed out in https://github.com/apache/spark/pull/15807#issuecomment-259143655 , the current subexpression elimination framework has a problem, it always evaluates all common subexpressions at the beginning, even they are inside conditional expressions and may not be accessed.
    
    Ideally we should implement it like scala lazy val, so we only evaluate it when it gets accessed at lease once. https://github.com/apache/spark/issues/15837 tries this approach, but it seems too complicated and may introduce performance regression.
    
    This PR simply stops common subexpression elimination for conditional expressions, with some cleanup.
    
    ## How was this patch tested?
    
    regression test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16659
    
----
commit 45608b12452632ece51051ef4e90132f73aebaf8
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-20T11:55:32Z

    disable common subexpression elimination for conditional expressions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97212331
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    --- End diff --
    
    I just found that not all the children of `AtLeastNNonNulls` get accessed during evaluation too. Do we need to add it here too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97074378
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -181,19 +185,17 @@ case class SimpleTypedAggregateExpression(
           outputExternalType,
           bufferDeserializer :: Nil)
     
    +    val serializeExprs = outputSerializer.map(_.transform {
    --- End diff --
    
    nit: outputSerializeExprs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97203236
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    +    //          false expressions in `If` may not get accessed, according to the predicate
    +    //          expression. We should only recurse into the predicate expression.
    +    //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
    +    //                condition. We should only recurse into the first condition expression as it
    +    //                will always get accessed.
    --- End diff --
    
    nvm, `CaseWhen` implements `CodegenFallback`. Thus, the previous impl skips it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97203267
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    +    //          false expressions in `If` may not get accessed, according to the predicate
    +    //          expression. We should only recurse into the predicate expression.
    +    //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
    +    //                condition. We should only recurse into the first condition expression as it
    +    //                will always get accessed.
    +    //   4. Coalesce: it's also a conditional expression, we should only recurse into the first
    +    //                children, because others may not get accessed.
    +    def childrenToRecurse: Seq[Expression] = expr match {
    +      case _: CodegenFallback => Nil
    +      case i: If => i.predicate :: Nil
    +      case c: CaseWhenBase => c.children.head :: Nil
    --- End diff --
    
    This case is not reachable, could we leave a comment above this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97202843
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    +    //          false expressions in `If` may not get accessed, according to the predicate
    +    //          expression. We should only recurse into the predicate expression.
    +    //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
    +    //                condition. We should only recurse into the first condition expression as it
    +    //                will always get accessed.
    --- End diff --
    
    `CaseWhen` could be very deep.
    > CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END 
    > When `expr1` = true, returns `expr2`; when `expr3` = true, return `expr4`; else return `expr5`.
    
    Compared with the previous impl, will we miss some expression elimination chances?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97202057
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -780,9 +780,6 @@ class CodegenContext {
           // The cost of doing subexpression elimination is:
           //   1. Extra function call, although this is probably *good* as the JIT can decide to
    --- End diff --
    
    maybe we should still keep it, to make the indent consistent between the "cost" part and the "benefit" part. It also makes it more obvious that we only have one "cost".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97191894
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    --- End diff --
    
    this's cool.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97087659
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -154,8 +160,7 @@ case class SimpleTypedAggregateExpression(
           "reduce",
           bufferExternalType,
           bufferDeserializer :: inputDeserializer.get :: Nil)
    -
    -    bufferSerializer.map(ReferenceToExpressions(_, reduced :: Nil))
    +    deserializeToBuffer(reduced)
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97188544
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -143,9 +143,15 @@ case class SimpleTypedAggregateExpression(
       override lazy val aggBufferAttributes: Seq[AttributeReference] =
         bufferSerializer.map(_.toAttribute.asInstanceOf[AttributeReference])
     
    +  private def deserializeToBuffer(expr: Expression): Seq[Expression] = {
    +    bufferDeserializer.map(_.transform {
    +      case _: BoundReference => expr
    +    })
    +  }
    +
       override lazy val initialValues: Seq[Expression] = {
         val zero = Literal.fromObject(aggregator.zero, bufferExternalType)
    -    bufferSerializer.map(ReferenceToExpressions(_, zero :: Nil))
    --- End diff --
    
    sorry, typo...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    LGTM pending test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16659


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    LGTM except a few comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71763/testReport)** for PR 16659 at commit [`e7d928c`](https://github.com/apache/spark/commit/e7d928c3ac781ac8863bdb531d6163ef9b3d6e1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    This looks good to me. Just few comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97203450
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    +    //          false expressions in `If` may not get accessed, according to the predicate
    +    //          expression. We should only recurse into the predicate expression.
    +    //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
    +    //                condition. We should only recurse into the first condition expression as it
    +    //                will always get accessed.
    +    //   4. Coalesce: it's also a conditional expression, we should only recurse into the first
    +    //                children, because others may not get accessed.
    --- End diff --
    
    Although `Coalesce` might  miss some expression elimination chances, I think it is very rare when users use the same expressions in  `Coalesce`. 
    
    Could you update the comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71752/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71720/testReport)** for PR 16659 at commit [`45608b1`](https://github.com/apache/spark/commit/45608b12452632ece51051ef4e90132f73aebaf8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71763/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97076732
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala ---
    @@ -1,92 +0,0 @@
    -/*
    - * Licensed to the Apache Software Foundation (ASF) under one or more
    - * contributor license agreements.  See the NOTICE file distributed with
    - * this work for additional information regarding copyright ownership.
    - * The ASF licenses this file to You under the Apache License, Version 2.0
    - * (the "License"); you may not use this file except in compliance with
    - * the License.  You may obtain a copy of the License at
    - *
    - *    http://www.apache.org/licenses/LICENSE-2.0
    - *
    - * Unless required by applicable law or agreed to in writing, software
    - * distributed under the License is distributed on an "AS IS" BASIS,
    - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    - * See the License for the specific language governing permissions and
    - * limitations under the License.
    - */
    -
    -package org.apache.spark.sql.catalyst.expressions
    -
    -import org.apache.spark.sql.catalyst.InternalRow
    -import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
    -import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, TypeCheckSuccess}
    -import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode}
    -import org.apache.spark.sql.catalyst.expressions.objects.LambdaVariable
    -import org.apache.spark.sql.types.DataType
    -
    -/**
    - * A special expression that evaluates [[BoundReference]]s by given expressions instead of the
    - * input row.
    - *
    - * @param result The expression that contains [[BoundReference]] and produces the final output.
    - * @param children The expressions that used as input values for [[BoundReference]].
    - */
    -case class ReferenceToExpressions(result: Expression, children: Seq[Expression])
    --- End diff --
    
    nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71720/testReport)** for PR 16659 at commit [`45608b1`](https://github.com/apache/spark/commit/45608b12452632ece51051ef4e90132f73aebaf8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97246099
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,33 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    -    // There are some special expressions that we should not recurse into children.
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
    +    // There are some special expressions that we should not recurse into all of its children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    -      case _: CodegenFallback => false
    -      case _: ReferenceToExpressions if skipReferenceToExpressions => false
    -      case _ => true
    +    //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
    +    //          false expressions in `If` may not get accessed, according to the predicate
    +    //          expression. We should only recurse into the predicate expression.
    +    //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
    +    //                condition. We should only recurse into the first condition expression as it
    +    //                will always get accessed.
    +    //   4. Coalesce: it's also a conditional expression, we should only recurse into the first
    +    //                children, because others may not get accessed.
    --- End diff --
    
    `Coalesce` may be just a small part of the whole expression tree, and the children of `Coalesce` may be same with other expressions inside the expression tree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71759/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97202652
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala ---
    @@ -115,41 +115,17 @@ class SubexpressionEliminationSuite extends SparkFunSuite {
         val mul2 = Multiply(mul, mul)
         val sqrt = Sqrt(mul2)
         val sum = Add(mul2, sqrt)
    -    equivalence.addExprTree(mul, true)
    -    equivalence.addExprTree(mul2, true)
    -    equivalence.addExprTree(sqrt, true)
    -    equivalence.addExprTree(sum, true)
    +    equivalence.addExprTree(mul)
    +    equivalence.addExprTree(mul2)
    +    equivalence.addExprTree(sqrt)
    +    equivalence.addExprTree(sum)
     
         // (one * two), (one * two) * (one * two) and sqrt( (one * two) * (one * two) ) should be found
         assert(equivalence.getAllEquivalentExprs.count(_.size > 1) == 3)
         assert(equivalence.getEquivalentExprs(mul).size == 3)
         assert(equivalence.getEquivalentExprs(mul2).size == 3)
         assert(equivalence.getEquivalentExprs(sqrt).size == 2)
         assert(equivalence.getEquivalentExprs(sum).size == 1)
    -
    -    // Some expressions inspired by TPCH-Q1
    -    // sum(l_quantity) as sum_qty,
    -    // sum(l_extendedprice) as sum_base_price,
    -    // sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    -    // sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    -    // avg(l_extendedprice) as avg_price,
    -    // avg(l_discount) as avg_disc
    -    equivalence = new EquivalentExpressions
    -    val quantity = Literal(1)
    -    val price = Literal(1.1)
    -    val discount = Literal(.24)
    -    val tax = Literal(0.1)
    -    equivalence.addExprTree(quantity, false)
    -    equivalence.addExprTree(price, false)
    -    equivalence.addExprTree(Multiply(price, Subtract(Literal(1), discount)), false)
    -    equivalence.addExprTree(
    -      Multiply(
    -        Multiply(price, Subtract(Literal(1), discount)),
    -        Add(Literal(1), tax)), false)
    -    equivalence.addExprTree(price, false)
    -    equivalence.addExprTree(discount, false)
    -    // quantity, price, discount and (price * (1 - discount))
    -    assert(equivalence.getAllEquivalentExprs.count(_.size > 1) == 4)
    --- End diff --
    
    To other reviewers: the new `addExprTree` always ignores the leaf nodes. Thus, these test cases are not needed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97087694
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -170,8 +175,7 @@ case class SimpleTypedAggregateExpression(
           "merge",
           bufferExternalType,
           leftBuffer :: rightBuffer :: Nil)
    -
    -    bufferSerializer.map(ReferenceToExpressions(_, merged :: Nil))
    +    deserializeToBuffer(merged)
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97201707
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -780,9 +780,6 @@ class CodegenContext {
           // The cost of doing subexpression elimination is:
           //   1. Extra function call, although this is probably *good* as the JIT can decide to
    --- End diff --
    
    but we do have an extra function call to evaluate common subexpression elimination at the beginning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    The child expression in `Sum` is wrapped in `Coalesce`. So making `org.apache.spark.sql.SQLQuerySuite.Common subexpression elimination` test failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71818/testReport)** for PR 16659 at commit [`0753ee6`](https://github.com/apache/spark/commit/0753ee6da4d5698d3a30d89e60ec45aca9e18f35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71818/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    cc @viirya @kiszk @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97201750
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -780,9 +780,6 @@ class CodegenContext {
           // The cost of doing subexpression elimination is:
           //   1. Extra function call, although this is probably *good* as the JIT can decide to
    --- End diff --
    
    : ) Just removed `1.`. Not the whole sentence


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71720/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97202872
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -780,9 +780,6 @@ class CodegenContext {
           // The cost of doing subexpression elimination is:
           //   1. Extra function call, although this is probably *good* as the JIT can decide to
    --- End diff --
    
    I am fine to keep it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97201646
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -780,9 +780,6 @@ class CodegenContext {
           // The cost of doing subexpression elimination is:
           //   1. Extra function call, although this is probably *good* as the JIT can decide to
    --- End diff --
    
    Nit: we removed `2.` and `3.`. We do not need `1.`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97188584
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -181,19 +185,17 @@ case class SimpleTypedAggregateExpression(
           outputExternalType,
           bufferDeserializer :: Nil)
     
    +    val serializeExprs = outputSerializer.map(_.transform {
    --- End diff --
    
    it's always used, no need to make it lazy val.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71763/testReport)** for PR 16659 at commit [`e7d928c`](https://github.com/apache/spark/commit/e7d928c3ac781ac8863bdb531d6163ef9b3d6e1c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97077388
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,28 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    --- End diff --
    
    From the code change, I don't see any place other than tests using ignoreLeaf = false. Curious why we have it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71759/testReport)** for PR 16659 at commit [`9d50048`](https://github.com/apache/spark/commit/9d50048a47b5052a85faa16535535eb86c146aa3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71818/testReport)** for PR 16659 at commit [`0753ee6`](https://github.com/apache/spark/commit/0753ee6da4d5698d3a30d89e60ec45aca9e18f35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71759/testReport)** for PR 16659 at commit [`9d50048`](https://github.com/apache/spark/commit/9d50048a47b5052a85faa16535535eb86c146aa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71752/testReport)** for PR 16659 at commit [`cda9723`](https://github.com/apache/spark/commit/cda9723e8adc07142521cd5d17568f6e5ff3b709).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    I reran the `DatasetBenchmark`, there is no performance regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    **[Test build #71752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71752/testReport)** for PR 16659 at commit [`cda9723`](https://github.com/apache/spark/commit/cda9723e8adc07142521cd5d17568f6e5ff3b709).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97201984
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ---
    @@ -780,9 +780,6 @@ class CodegenContext {
           // The cost of doing subexpression elimination is:
           //   1. Extra function call, although this is probably *good* as the JIT can decide to
    --- End diff --
    
    oh i see :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97076944
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala ---
    @@ -67,28 +67,28 @@ class EquivalentExpressions {
       /**
        * Adds the expression to this data structure recursively. Stops if a matching expression
        * is found. That is, if `expr` has already been added, its children are not added.
    -   * If ignoreLeaf is true, leaf nodes are ignored.
        */
    -  def addExprTree(
    -      root: Expression,
    -      ignoreLeaf: Boolean = true,
    -      skipReferenceToExpressions: Boolean = true): Unit = {
    -    val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
    +  def addExprTree(expr: Expression): Unit = {
    +    val skip = expr.isInstanceOf[LeafExpression] ||
           // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the
           // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning.
    -      root.find(_.isInstanceOf[LambdaVariable]).isDefined
    +      expr.find(_.isInstanceOf[LambdaVariable]).isDefined
    +
         // There are some special expressions that we should not recurse into children.
         //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
    -    //   2. ReferenceToExpressions: it's kind of an explicit sub-expression elimination.
    -    val shouldRecurse = root match {
    -      // TODO: some expressions implements `CodegenFallback` but can still do codegen,
    -      // e.g. `CaseWhen`, we should support them.
    +    //   2. conditional expressions: common subexpressions will always be evaluated at the
    +    //                               beginning, so we should not recurse into condition expressions,
    +    //                               whole children may not get evaluated.
    --- End diff --
    
    maybe rephrase it? `whole children may not get evaluated` looks not easy to understand. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97074590
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -181,19 +185,17 @@ case class SimpleTypedAggregateExpression(
           outputExternalType,
           bufferDeserializer :: Nil)
     
    +    val serializeExprs = outputSerializer.map(_.transform {
    --- End diff --
    
    lazy val?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16659#discussion_r97087587
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala ---
    @@ -143,9 +143,15 @@ case class SimpleTypedAggregateExpression(
       override lazy val aggBufferAttributes: Seq[AttributeReference] =
         bufferSerializer.map(_.toAttribute.asInstanceOf[AttributeReference])
     
    +  private def deserializeToBuffer(expr: Expression): Seq[Expression] = {
    +    bufferDeserializer.map(_.transform {
    +      case _: BoundReference => expr
    +    })
    +  }
    +
       override lazy val initialValues: Seq[Expression] = {
         val zero = Literal.fromObject(aggregator.zero, bufferExternalType)
    -    bufferSerializer.map(ReferenceToExpressions(_, zero :: Nil))
    --- End diff --
    
    Why `bufferSerializer` now replaced with `bufferDeserializer`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16659: [SPARK-19309][SQL] disable common subexpression eliminat...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16659
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org