You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/12/04 07:18:32 UTC

[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/19873

    [SPARK-20392][SQL] Set barrier to prevent re-entering a tree

    ## What changes were proposed in this pull request?
    
    It is reported that there is performance downgrade when applying ML pipeline for dataset with many columns but few rows.
    
    A big part of the performance downgrade comes from some operations (e.g., `select`) on DataFrame/Dataset which re-create new DataFrame/Dataset with a new `LogicalPlan`. The cost can be ignored in the usage of SQL, normally.
    
    However, it's not rare to chain dozens of pipeline stages in ML. When the query plan grows incrementally during running those stages, the total cost spent on re-creation of DataFrame grows too. In particular, the `Analyzer` will go through the big query plan even most part of it is analyzed.
    
    By eliminating part of the cost, the time to run the example code locally is reduced from about 1min to about 30 secs.
    
    In particular, the time applying the pipeline locally is mostly spent on calling transform of the 137 `Bucketizer`s. Before the change, each call of `Bucketizer`'s transform can cost about 0.4 sec. So the total time spent on all `Bucketizer`s' transform is about 50 secs. After the change, each call only costs about 0.1 sec.
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-20392-reopen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19873.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19873
    
----
commit 136fd30cb98609e648a8b689a28853c1bae67bf7
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-12-04T07:19:35Z

    Add analysis barrier around analyzed plans.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154857858
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -666,7 +667,9 @@ class Analyzer(
          * Generate a new logical plan for the right child with different expression IDs
          * for all conflicting attributes.
          */
    -    private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = {
    +    private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = {
    --- End diff --
    
    What is `oriRight `?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154896108
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -50,41 +50,6 @@ abstract class LogicalPlan
       /** Returns true if this subtree has data from a streaming data source. */
       def isStreaming: Boolean = children.exists(_.isStreaming == true)
     
    -  /**
    -   * Returns a copy of this node where `rule` has been recursively applied first to all of its
    -   * children and then itself (post-order). When `rule` does not apply to a given node, it is left
    -   * unchanged.  This function is similar to `transformUp`, but skips sub-trees that have already
    -   * been marked as analyzed.
    -   *
    -   * @param rule the function use to transform this nodes children
    -   */
    -  def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = {
    --- End diff --
    
    Removed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154858997
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -666,7 +667,9 @@ class Analyzer(
          * Generate a new logical plan for the right child with different expression IDs
          * for all conflicting attributes.
          */
    -    private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = {
    +    private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = {
    --- End diff --
    
    Use `originalRight`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154862947
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -241,7 +241,7 @@ class PlannerSuite extends SharedSQLContext {
       test("collapse adjacent repartitions") {
         val doubleRepartitioned = testData.repartition(10).repartition(20).coalesce(5)
         def countRepartitions(plan: LogicalPlan): Int = plan.collect { case r: Repartition => r }.length
    -    assert(countRepartitions(doubleRepartitioned.queryExecution.logical) === 3)
    +    assert(countRepartitions(doubleRepartitioned.queryExecution.analyzed) === 3)
    --- End diff --
    
    Please see previous discussion: https://github.com/apache/spark/pull/17770/files#r118480364



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154863115
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -280,7 +280,7 @@ object TypeCoercion {
        */
       object WidenSetOperationTypes extends Rule[LogicalPlan] {
     
    -    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
           case p if p.analyzed => p
    --- End diff --
    
    Sorry?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84520/testReport)** for PR 19873 at commit [`d2375e0`](https://github.com/apache/spark/commit/d2375e0185185cd3516b25c14b48c12840957cbf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154863367
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -280,7 +280,7 @@ object TypeCoercion {
        */
       object WidenSetOperationTypes extends Rule[LogicalPlan] {
     
    -    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
           case p if p.analyzed => p
    --- End diff --
    
    In which cases, we should still use the `analyzed ` flag?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84520/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Thanks! @gatorsmile @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r155129282
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -881,3 +881,10 @@ case class Deduplicate(
     
       override def output: Seq[Attribute] = child.output
     }
    +
    +/** A logical plan for setting a barrier of analysis */
    +case class AnalysisBarrier(child: LogicalPlan) extends LeafNode {
    --- End diff --
    
    Ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    From the PR description, I am unable to tell the changes made in this PR. We need a better description to explain what is the solution proposed in this PR. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154859083
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1098,7 +1103,8 @@ class Analyzer(
               case ae: AnalysisException => s
             }
     
    -      case f @ Filter(cond, child) if !f.resolved && child.resolved =>
    +      case f @ Filter(cond, oriChild) if !f.resolved && oriChild.resolved =>
    --- End diff --
    
    Use `originalChild`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154855592
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala ---
    @@ -241,7 +241,7 @@ class PlannerSuite extends SharedSQLContext {
       test("collapse adjacent repartitions") {
         val doubleRepartitioned = testData.repartition(10).repartition(20).coalesce(5)
         def countRepartitions(plan: LogicalPlan): Int = plan.collect { case r: Repartition => r }.length
    -    assert(countRepartitions(doubleRepartitioned.queryExecution.logical) === 3)
    +    assert(countRepartitions(doubleRepartitioned.queryExecution.analyzed) === 3)
    --- End diff --
    
    is it a necessary change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84430/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154863033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1073,11 +1076,13 @@ class Analyzer(
        * The HAVING clause could also used a grouping columns that is not presented in the SELECT.
        */
       object ResolveMissingReferences extends Rule[LogicalPlan] {
    -    def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
           // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions
    +      case sa @ Sort(_, _, AnalysisBarrier(child: Aggregate)) => sa
           case sa @ Sort(_, _, child: Aggregate) => sa
     
    -      case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
    +      case s @ Sort(order, _, oriChild) if !s.resolved && oriChild.resolved =>
    --- End diff --
    
    Ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84417/testReport)** for PR 19873 at commit [`136fd30`](https://github.com/apache/spark/commit/136fd30cb98609e648a8b689a28853c1bae67bf7).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AnalysisBarrier(child: LogicalPlan) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84420/testReport)** for PR 19873 at commit [`136fd30`](https://github.com/apache/spark/commit/136fd30cb98609e648a8b689a28853c1bae67bf7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84417/testReport)** for PR 19873 at commit [`136fd30`](https://github.com/apache/spark/commit/136fd30cb98609e648a8b689a28853c1bae67bf7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84475/testReport)** for PR 19873 at commit [`bae034d`](https://github.com/apache/spark/commit/bae034d237a3cf1f2f63d546ad61fbda47813a9b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154857587
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -280,7 +280,7 @@ object TypeCoercion {
        */
       object WidenSetOperationTypes extends Rule[LogicalPlan] {
     
    -    def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
           case p if p.analyzed => p
    --- End diff --
    
    Why?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84518/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84475/testReport)** for PR 19873 at commit [`bae034d`](https://github.com/apache/spark/commit/bae034d237a3cf1f2f63d546ad61fbda47813a9b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154863020
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -470,7 +470,7 @@ case class DataSource(
           }.head
         }
         // For partitioned relation r, r.schema's column ordering can be different from the column
    -    // ordering of data.logicalPlan (partition columns are all moved after data column).  This
    +    // ordering of data.logicalPlan (partition columns are all moved after data column). This
    --- End diff --
    
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    @viirya Could you resolve the conflicts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r155130916
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ---
    @@ -321,39 +319,40 @@ object TypeCoercion {
           }
         }
     
    -    override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
    -      // Skip nodes who's children have not been resolved yet.
    -      case e if !e.childrenResolved => e
    -
    -      case a @ BinaryArithmetic(left @ StringType(), right) =>
    -        a.makeCopy(Array(Cast(left, DoubleType), right))
    -      case a @ BinaryArithmetic(left, right @ StringType()) =>
    -        a.makeCopy(Array(left, Cast(right, DoubleType)))
    -
    -      // For equality between string and timestamp we cast the string to a timestamp
    -      // so that things like rounding of subsecond precision does not affect the comparison.
    -      case p @ Equality(left @ StringType(), right @ TimestampType()) =>
    -        p.makeCopy(Array(Cast(left, TimestampType), right))
    -      case p @ Equality(left @ TimestampType(), right @ StringType()) =>
    -        p.makeCopy(Array(left, Cast(right, TimestampType)))
    -
    -      case p @ BinaryComparison(left, right)
    -        if findCommonTypeForBinaryComparison(left.dataType, right.dataType).isDefined =>
    -        val commonType = findCommonTypeForBinaryComparison(left.dataType, right.dataType).get
    -        p.makeCopy(Array(castExpr(left, commonType), castExpr(right, commonType)))
    -
    -      case Abs(e @ StringType()) => Abs(Cast(e, DoubleType))
    -      case Sum(e @ StringType()) => Sum(Cast(e, DoubleType))
    -      case Average(e @ StringType()) => Average(Cast(e, DoubleType))
    -      case StddevPop(e @ StringType()) => StddevPop(Cast(e, DoubleType))
    -      case StddevSamp(e @ StringType()) => StddevSamp(Cast(e, DoubleType))
    -      case UnaryMinus(e @ StringType()) => UnaryMinus(Cast(e, DoubleType))
    -      case UnaryPositive(e @ StringType()) => UnaryPositive(Cast(e, DoubleType))
    -      case VariancePop(e @ StringType()) => VariancePop(Cast(e, DoubleType))
    -      case VarianceSamp(e @ StringType()) => VarianceSamp(Cast(e, DoubleType))
    -      case Skewness(e @ StringType()) => Skewness(Cast(e, DoubleType))
    -      case Kurtosis(e @ StringType()) => Kurtosis(Cast(e, DoubleType))
    -    }
    +    override protected def coerceTypes(plan: LogicalPlan): LogicalPlan =
    +      plan transformAllExpressions {
    --- End diff --
    
    For indentation...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154858146
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ---
    @@ -470,7 +470,7 @@ case class DataSource(
           }.head
         }
         // For partitioned relation r, r.schema's column ordering can be different from the column
    -    // ordering of data.logicalPlan (partition columns are all moved after data column).  This
    +    // ordering of data.logicalPlan (partition columns are all moved after data column). This
    --- End diff --
    
    Get rid of changes in this file.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84430/testReport)** for PR 19873 at commit [`9f5a0e4`](https://github.com/apache/spark/commit/9f5a0e458fa0cb42d6850e16d74994af1b1a3752).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r155064881
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala ---
    @@ -23,8 +23,8 @@ import org.apache.spark.sql.catalyst.plans.logical._
     import org.apache.spark.sql.types.IntegerType
     
     /**
    - * This suite is used to test [[LogicalPlan]]'s `resolveOperators` and make sure it can correctly
    - * skips sub-trees that have already been marked as analyzed.
    + * This suite is used to test [[LogicalPlan]]'s `transformUp` plus analysis barrier and make sure
    --- End diff --
    
    Since both `transformUp ` and `transformDown` work, create a test case using `transformDown`. Also update the comments here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84430/testReport)** for PR 19873 at commit [`9f5a0e4`](https://github.com/apache/spark/commit/9f5a0e458fa0cb42d6850e16d74994af1b1a3752).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AnalysisBarrier(child: LogicalPlan) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84475/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84420/testReport)** for PR 19873 at commit [`136fd30`](https://github.com/apache/spark/commit/136fd30cb98609e648a8b689a28853c1bae67bf7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class AnalysisBarrier(child: LogicalPlan) extends LeafNode `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r155129278
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala ---
    @@ -23,8 +23,8 @@ import org.apache.spark.sql.catalyst.plans.logical._
     import org.apache.spark.sql.types.IntegerType
     
     /**
    - * This suite is used to test [[LogicalPlan]]'s `resolveOperators` and make sure it can correctly
    - * skips sub-trees that have already been marked as analyzed.
    + * This suite is used to test [[LogicalPlan]]'s `transformUp` plus analysis barrier and make sure
    --- End diff --
    
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84417/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154859044
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -1073,11 +1076,13 @@ class Analyzer(
        * The HAVING clause could also used a grouping columns that is not presented in the SELECT.
        */
       object ResolveMissingReferences extends Rule[LogicalPlan] {
    -    def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
           // Skip sort with aggregate. This will be handled in ResolveAggregateFunctions
    +      case sa @ Sort(_, _, AnalysisBarrier(child: Aggregate)) => sa
           case sa @ Sort(_, _, child: Aggregate) => sa
     
    -      case s @ Sort(order, _, child) if !s.resolved && child.resolved =>
    +      case s @ Sort(order, _, oriChild) if !s.resolved && oriChild.resolved =>
    --- End diff --
    
    Use `originalChild`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84518/testReport)** for PR 19873 at commit [`4775a02`](https://github.com/apache/spark/commit/4775a0249b34be16e7bc15dbd4b291fddfe92656).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    cc @cloud-fan @hvanhovell Basically this is the same changes in #17770.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r155064613
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ---
    @@ -881,3 +881,10 @@ case class Deduplicate(
     
       override def output: Seq[Attribute] = child.output
     }
    +
    +/** A logical plan for setting a barrier of analysis */
    +case class AnalysisBarrier(child: LogicalPlan) extends LeafNode {
    --- End diff --
    
    Put the PR descriptions to the comment of this class?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19873


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84477/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154855150
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -50,41 +50,6 @@ abstract class LogicalPlan
       /** Returns true if this subtree has data from a streaming data source. */
       def isStreaming: Boolean = children.exists(_.isStreaming == true)
     
    -  /**
    -   * Returns a copy of this node where `rule` has been recursively applied first to all of its
    -   * children and then itself (post-order). When `rule` does not apply to a given node, it is left
    -   * unchanged.  This function is similar to `transformUp`, but skips sub-trees that have already
    -   * been marked as analyzed.
    -   *
    -   * @param rule the function use to transform this nodes children
    -   */
    -  def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = {
    --- End diff --
    
    can we also remove the `analyzed` flag in this class?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84518/testReport)** for PR 19873 at commit [`4775a02`](https://github.com/apache/spark/commit/4775a0249b34be16e7bc15dbd4b291fddfe92656).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    case class ReqAndHandler(req: Request, handler: MemberHandler)`
      * `trait TypeCoercionRule extends Rule[LogicalPlan] with Logging `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19873: [SPARK-20392][SQL] Set barrier to prevent re-ente...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19873#discussion_r154863003
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -666,7 +667,9 @@ class Analyzer(
          * Generate a new logical plan for the right child with different expression IDs
          * for all conflicting attributes.
          */
    -    private def dedupRight (left: LogicalPlan, right: LogicalPlan): LogicalPlan = {
    +    private def dedupRight (left: LogicalPlan, oriRight: LogicalPlan): LogicalPlan = {
    --- End diff --
    
    Ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84520/testReport)** for PR 19873 at commit [`d2375e0`](https://github.com/apache/spark/commit/d2375e0185185cd3516b25c14b48c12840957cbf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    LGTM, also cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84420/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84477/testReport)** for PR 19873 at commit [`54182bf`](https://github.com/apache/spark/commit/54182bf3ea6db2815146f9e158bbb08609400b93).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19873: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19873
  
    **[Test build #84477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84477/testReport)** for PR 19873 at commit [`54182bf`](https://github.com/apache/spark/commit/54182bf3ea6db2815146f9e158bbb08609400b93).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org