You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/01/03 08:47:08 UTC

[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20139

    [SPARK-22944][SQL] improve FoldablePropagation

    ## What changes were proposed in this pull request?
    
    `FoldablePropagation` is a little tricky as it needs to handle attributes that are miss-derived from children, e.g. outer join outputs. It does a kind of stop-able tree transform, to skip to apply this rule when hit a node which may have miss-derived attributes.
    
    Logically we should be able to apply this rule above the unsupported nodes, by just treating the unsupported nodes as leaf nodes. This PR improves this rule to not stop the tree transformation, but reduce the foldable expressions that we want to propagate.
    
    ## How was this patch tested?
    
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark foldable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20139
    
----
commit b4787f70c0339f5a7a3b70c7a53ea03ea9043c7a
Author: Wenchen Fan <we...@...>
Date:   2018-01-03T08:39:33Z

    improve FoldablePropagation

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Thank you for pinging me. LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159510009
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -557,22 +557,16 @@ object FoldablePropagation extends Rule[LogicalPlan] {
               }.toSeq)
               newJoin
     
    -        // Similar to Join, Expand also miss-derives output attributes from child attributes, we
    -        // should exclude them when propagating.
    -        // TODO(hvanhovell): Expand should use new attributes as the output attributes.
    -        case expand: Expand =>
    -          val newExpand = expand.copy(projections = expand.projections.map { projection =>
    +        // We can not replace the attributes in `Expand.output`. If there are other non-leaf
    +        // operators that have the `output` field, we should put them here too.
    +        case expand: Expand if foldableMap.nonEmpty =>
    +          expand.copy(projections = expand.projections.map { projection =>
                 projection.map(_.transform(replaceFoldable))
               })
    -          val missDerivedAttrsSet = expand.child.outputSet
    -          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    -            case (attr, _) => missDerivedAttrsSet.contains(attr)
    -          }.toSeq)
    --- End diff --
    
    We need this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159451696
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    +        // Join derives the output attributes from its child while they are actually not the
    +        // same attributes. For example, the output of outer join is not always picked from its
    +        // children, but can also be null. We should exclude these miss-derived attributes when
    +        // propagating the foldable expressions.
             // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
             // of outer join.
    -        case j @ Join(_, _, Inner, _) if !stop =>
    -          j.transformExpressions(replaceFoldable)
    +        case j @ Join(left, right, joinType, _) =>
    +          val newJoin = j.transformExpressions(replaceFoldable)
    +          val missDerivedAttrsSet: AttributeSet = AttributeSet(joinType match {
    --- End diff --
    
    > The table expressions on either side of a left or right outer join are referred to as preserved and null-supplying. In a left outer join, the left-hand table expression is preserved and the right-hand table expression is null-supplying.
    
    How about renaming `missDerivedAttrsSet ` to `nullSupplyingAttrsSet `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85635/testReport)** for PR 20139 at commit [`661e5d9`](https://github.com/apache/spark/commit/661e5d918225e6fe639ec72996b09227b03a9ea1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85627/testReport)** for PR 20139 at commit [`b4787f7`](https://github.com/apache/spark/commit/b4787f70c0339f5a7a3b70c7a53ea03ea9043c7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159463254
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
    --- End diff --
    
    ah good catch!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85648/testReport)** for PR 20139 at commit [`df75bff`](https://github.com/apache/spark/commit/df75bffa1c27c2b70495a1b2c8988f3a1c979918).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159465670
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    +        // Join derives the output attributes from its child while they are actually not the
    +        // same attributes. For example, the output of outer join is not always picked from its
    +        // children, but can also be null. We should exclude these miss-derived attributes when
    +        // propagating the foldable expressions.
             // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
             // of outer join.
    -        case j @ Join(_, _, Inner, _) if !stop =>
    -          j.transformExpressions(replaceFoldable)
    +        case j @ Join(left, right, joinType, _) =>
    +          val newJoin = j.transformExpressions(replaceFoldable)
    +          val missDerivedAttrsSet: AttributeSet = AttributeSet(joinType match {
    +            case _: InnerLike | LeftExistence(_) => Nil
    +            case LeftOuter => right.output
    +            case RightOuter => left.output
    +            case FullOuter => left.output ++ right.output
    +          })
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    +            case (attr, _) => missDerivedAttrsSet.contains(attr)
    +          }.toSeq)
    +          newJoin
     
    -        // We can fold the projections an expand holds. However expand changes the output columns
    -        // and often reuses the underlying attributes; so we cannot assume that a column is still
    -        // foldable after the expand has been applied.
    +        // Similar to Join, Expand also miss-derives output attributes from child attributes, we
    +        // should exclude them when propagating.
             // TODO(hvanhovell): Expand should use new attributes as the output attributes.
    -        case expand: Expand if !stop =>
    +        case expand: Expand =>
               val newExpand = expand.copy(projections = expand.projections.map { projection =>
                 projection.map(_.transform(replaceFoldable))
               })
    -          stop = true
    +          val missDerivedAttrsSet = expand.child.outputSet
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    +            case (attr, _) => missDerivedAttrsSet.contains(attr)
    +          }.toSeq)
               newExpand
     
    +        // For other plans, they are not safe to apply foldable propagation, and they should not
    +        // propagate foldable expressions from children.
             case other =>
    -          stop = true
    +          val childrenOutputSet = AttributeSet(other.children.flatMap(_.output))
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    --- End diff --
    
    we can't as it may not be empty. We may have new attributes produced by operators above those unsupported operators.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159497612
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,44 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if foldableMap.nonEmpty && canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    +        // Join derives the output attributes from its child while they are actually not the
    +        // same attributes. For example, the output of outer join is not always picked from its
    +        // children, but can also be null. We should exclude these miss-derived attributes when
    +        // propagating the foldable expressions.
             // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
             // of outer join.
    -        case j @ Join(_, _, Inner, _) if !stop =>
    -          j.transformExpressions(replaceFoldable)
    -
    -        // We can fold the projections an expand holds. However expand changes the output columns
    -        // and often reuses the underlying attributes; so we cannot assume that a column is still
    -        // foldable after the expand has been applied.
    -        // TODO(hvanhovell): Expand should use new attributes as the output attributes.
    -        case expand: Expand if !stop =>
    -          val newExpand = expand.copy(projections = expand.projections.map { projection =>
    +        case j @ Join(left, right, joinType, _) if foldableMap.nonEmpty =>
    +          val newJoin = j.transformExpressions(replaceFoldable)
    --- End diff --
    
    nit. Can we remove `val newJoin` declaration by moving this line 548 into line 558?
    ```
    - val newJoin = j.transformExpressions(replaceFoldable)
      ...
    - newJoin
    + j.transformExpressions(replaceFoldable)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159438848
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -506,18 +506,21 @@ object NullPropagation extends Rule[LogicalPlan] {
     
     
     /**
    - * Propagate foldable expressions:
      * Replace attributes with aliases of the original foldable expressions if possible.
    - * Other optimizations will take advantage of the propagated foldable expressions.
    - *
    + * Other optimizations will take advantage of the propagated foldable expressions. For example,
    + * This rule can optimize
    --- End diff --
    
    `This` -> `this`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159454639
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    +        // Join derives the output attributes from its child while they are actually not the
    +        // same attributes. For example, the output of outer join is not always picked from its
    +        // children, but can also be null. We should exclude these miss-derived attributes when
    +        // propagating the foldable expressions.
             // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
             // of outer join.
    -        case j @ Join(_, _, Inner, _) if !stop =>
    -          j.transformExpressions(replaceFoldable)
    +        case j @ Join(left, right, joinType, _) =>
    +          val newJoin = j.transformExpressions(replaceFoldable)
    +          val missDerivedAttrsSet: AttributeSet = AttributeSet(joinType match {
    +            case _: InnerLike | LeftExistence(_) => Nil
    +            case LeftOuter => right.output
    +            case RightOuter => left.output
    +            case FullOuter => left.output ++ right.output
    +          })
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    +            case (attr, _) => missDerivedAttrsSet.contains(attr)
    +          }.toSeq)
    +          newJoin
     
    -        // We can fold the projections an expand holds. However expand changes the output columns
    -        // and often reuses the underlying attributes; so we cannot assume that a column is still
    -        // foldable after the expand has been applied.
    +        // Similar to Join, Expand also miss-derives output attributes from child attributes, we
    +        // should exclude them when propagating.
             // TODO(hvanhovell): Expand should use new attributes as the output attributes.
    -        case expand: Expand if !stop =>
    +        case expand: Expand =>
               val newExpand = expand.copy(projections = expand.projections.map { projection =>
                 projection.map(_.transform(replaceFoldable))
               })
    -          stop = true
    +          val missDerivedAttrsSet = expand.child.outputSet
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    +            case (attr, _) => missDerivedAttrsSet.contains(attr)
    +          }.toSeq)
               newExpand
     
    +        // For other plans, they are not safe to apply foldable propagation, and they should not
    +        // propagate foldable expressions from children.
             case other =>
    -          stop = true
    +          val childrenOutputSet = AttributeSet(other.children.flatMap(_.output))
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    --- End diff --
    
    Can we directly set `foldableMap ` to an empty map?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85648/testReport)** for PR 20139 at commit [`df75bff`](https://github.com/apache/spark/commit/df75bffa1c27c2b70495a1b2c8988f3a1c979918).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85624/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85627/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159453456
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
    --- End diff --
    
    For all these cases, how about adding another condition `if foldableMap.nonEmpty && xyz`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20139


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85627/testReport)** for PR 20139 at commit [`b4787f7`](https://github.com/apache/spark/commit/b4787f70c0339f5a7a3b70c7a53ea03ea9043c7a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85635/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Oops. I deleted my previous comment. Never mind.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85624/testReport)** for PR 20139 at commit [`b4787f7`](https://github.com/apache/spark/commit/b4787f70c0339f5a7a3b70c7a53ea03ea9043c7a).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85648/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    cc @dongjoon-hyun @hvanhovell @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85635/testReport)** for PR 20139 at commit [`661e5d9`](https://github.com/apache/spark/commit/661e5d918225e6fe639ec72996b09227b03a9ea1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    **[Test build #85624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85624/testReport)** for PR 20139 at commit [`b4787f7`](https://github.com/apache/spark/commit/b4787f70c0339f5a7a3b70c7a53ea03ea9043c7a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159471147
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    +        // Join derives the output attributes from its child while they are actually not the
    +        // same attributes. For example, the output of outer join is not always picked from its
    +        // children, but can also be null. We should exclude these miss-derived attributes when
    +        // propagating the foldable expressions.
             // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
             // of outer join.
    -        case j @ Join(_, _, Inner, _) if !stop =>
    -          j.transformExpressions(replaceFoldable)
    +        case j @ Join(left, right, joinType, _) =>
    +          val newJoin = j.transformExpressions(replaceFoldable)
    +          val missDerivedAttrsSet: AttributeSet = AttributeSet(joinType match {
    --- End diff --
    
    I'd like to keep the word `miss` as it's really a mistake. We should fix it eventually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159510376
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala ---
    @@ -161,4 +161,23 @@ class FoldablePropagationSuite extends PlanTest {
         val correctAnswer = correctExpand.where(a1.isNotNull).select(a1, a2).analyze
         comparePlans(optimized, correctAnswer)
       }
    +
    +  test("Propagate above outer join") {
    +    val left = LocalRelation('a.int).select('a, Literal(1).as('b))
    +    val right = LocalRelation('c.int).select('c, Literal(1).as("d"))
    --- End diff --
    
    nit. "d" -> 'd


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20139#discussion_r159454480
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ---
    @@ -530,38 +533,50 @@ object FoldablePropagation extends Rule[LogicalPlan] {
         if (foldableMap.isEmpty) {
           plan
         } else {
    -      var stop = false
           CleanupAliases(plan.transformUp {
    -        // A leaf node should not stop the folding process (note that we are traversing up the
    -        // tree, starting at the leaf nodes); so we are allowing it.
    -        case l: LeafNode =>
    -          l
    -
             // We can only propagate foldables for a subset of unary nodes.
    -        case u: UnaryNode if !stop && canPropagateFoldables(u) =>
    +        case u: UnaryNode if canPropagateFoldables(u) =>
               u.transformExpressions(replaceFoldable)
     
    -        // Allow inner joins. We do not allow outer join, although its output attributes are
    -        // derived from its children, they are actually different attributes: the output of outer
    -        // join is not always picked from its children, but can also be null.
    +        // Join derives the output attributes from its child while they are actually not the
    +        // same attributes. For example, the output of outer join is not always picked from its
    +        // children, but can also be null. We should exclude these miss-derived attributes when
    +        // propagating the foldable expressions.
             // TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes
             // of outer join.
    -        case j @ Join(_, _, Inner, _) if !stop =>
    -          j.transformExpressions(replaceFoldable)
    +        case j @ Join(left, right, joinType, _) =>
    +          val newJoin = j.transformExpressions(replaceFoldable)
    +          val missDerivedAttrsSet: AttributeSet = AttributeSet(joinType match {
    +            case _: InnerLike | LeftExistence(_) => Nil
    +            case LeftOuter => right.output
    +            case RightOuter => left.output
    +            case FullOuter => left.output ++ right.output
    +          })
    +          foldableMap = AttributeMap(foldableMap.baseMap.values.filterNot {
    +            case (attr, _) => missDerivedAttrsSet.contains(attr)
    +          }.toSeq)
    +          newJoin
     
    -        // We can fold the projections an expand holds. However expand changes the output columns
    -        // and often reuses the underlying attributes; so we cannot assume that a column is still
    -        // foldable after the expand has been applied.
    +        // Similar to Join, Expand also miss-derives output attributes from child attributes, we
    +        // should exclude them when propagating.
    --- End diff --
    
    After https://github.com/apache/spark/pull/12496, the above statement is still true?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20139: [SPARK-22944][SQL] improve FoldablePropagation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20139
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org