You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rxin <gi...@git.apache.org> on 2018/11/20 16:24:23 UTC

[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for query plan...

GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/23096

    [SPARK-26129][SQL] Instrumentation for query planning time

    ## What changes were proposed in this pull request?
    We currently don't have good visibility into query planning time (analysis vs optimization vs physical planning). This patch adds a simple utility to track the runtime of various rules and various planning phases.
    
    ## How was this patch tested?
    Added unit tests and end-to-end integration tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark SPARK-26129

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23096.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23096
    
----
commit b6a3d02f2c2b0eff71f92c3ede854edc3b5bf9f8
Author: Reynold Xin <rx...@...>
Date:   2018-11-20T16:22:35Z

    [SPARK-26129][SQL] Instrumentation for query planning time

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99070/testReport)** for PR 23096 at commit [`dd61273`](https://github.com/apache/spark/commit/dd61273024bf322bfe03f3cd5e06803eaae62d8f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235151635
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala ---
    @@ -88,15 +92,18 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging {
                 val startTime = System.nanoTime()
                 val result = rule(plan)
                 val runTime = System.nanoTime() - startTime
    +            val effective = !result.fastEquals(plan)
     
    -            if (!result.fastEquals(plan)) {
    +            if (effective) {
                   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
                   queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
                   planChangeLogger.log(rule.ruleName, plan, result)
                 }
                 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
                 queryExecutionMetrics.incNumExecution(rule.ruleName)
     
    +            tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective))
    --- End diff --
    
    Doesn't this make the query-local and the global metrics inconsistent when tracker is None?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23096


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235137901
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -211,7 +216,7 @@ class Analyzer(
           case With(child, relations) =>
             substituteCTE(child, relations.foldLeft(Seq.empty[(String, LogicalPlan)]) {
               case (resolved, (name, relation)) =>
    -            resolved :+ name -> executeSameContext(substituteCTE(relation, resolved))
    +            resolved :+ name -> executeSameContext(substituteCTE(relation, resolved), None)
    --- End diff --
    
    For my understanding, why should we pass a None tracker here? Wouldn't this hide the time of, e.g., Metastore operations to resolve tables in the CTE definition?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235146495
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala ---
    @@ -54,7 +54,7 @@ trait AnalysisTest extends PlanTest {
           expectedPlan: LogicalPlan,
           caseSensitive: Boolean = true): Unit = {
         val analyzer = getAnalyzer(caseSensitive)
    -    val actualPlan = analyzer.executeAndCheck(inputPlan)
    +    val actualPlan = analyzer.executeAndCheck(inputPlan, None)
    --- End diff --
    
    This interface change might invite conflicts with OSS when new tests are added? Have you considered alternatives that do not require such a change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99070/testReport)** for PR 23096 at commit [`dd61273`](https://github.com/apache/spark/commit/dd61273024bf322bfe03f3cd5e06803eaae62d8f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99081/testReport)** for PR 23096 at commit [`2cd069c`](https://github.com/apache/spark/commit/2cd069c21808420c138ef8444dcb407454a447fd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235180423
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala ---
    @@ -88,15 +101,20 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging {
                 val startTime = System.nanoTime()
                 val result = rule(plan)
                 val runTime = System.nanoTime() - startTime
    +            val effective = !result.fastEquals(plan)
     
    -            if (!result.fastEquals(plan)) {
    +            if (effective) {
                   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
                   queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
                   planChangeLogger.log(rule.ruleName, plan, result)
                 }
                 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
                 queryExecutionMetrics.incNumExecution(rule.ruleName)
     
    +            if (tracker ne null) {
    --- End diff --
    
    Why do we need to be defensive. Should this be an assert instead? Might be worth a comment explaining.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5195/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99065/testReport)** for PR 23096 at commit [`b6a3d02`](https://github.com/apache/spark/commit/b6a3d02f2c2b0eff71f92c3ede854edc3b5bf9f8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235161825
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -696,7 +701,7 @@ class Analyzer(
                   s"avoid errors. Increase the value of ${SQLConf.MAX_NESTED_VIEW_DEPTH.key} to work " +
                   "around this.")
               }
    -          executeSameContext(child)
    +          executeSameContext(child, None)
    --- End diff --
    
    No great reason. I just used None for everything, except the top level, because it is very difficult to wire the tracker here without refactoring a lot of code.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235141963
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -102,29 +102,34 @@ class Analyzer(
         this(catalog, conf, conf.optimizerMaxIterations)
       }
     
    -  def executeAndCheck(plan: LogicalPlan): LogicalPlan = AnalysisHelper.markInAnalyzer {
    -    val analyzed = execute(plan)
    -    try {
    -      checkAnalysis(analyzed)
    -      analyzed
    -    } catch {
    -      case e: AnalysisException =>
    -        val ae = new AnalysisException(e.message, e.line, e.startPosition, Option(analyzed))
    -        ae.setStackTrace(e.getStackTrace)
    -        throw ae
    +  def executeAndCheck(plan: LogicalPlan, tracker: Option[QueryPlanningTracker]): LogicalPlan = {
    +    AnalysisHelper.markInAnalyzer {
    +      val analyzed = execute(plan, tracker)
    +      try {
    +        checkAnalysis(analyzed)
    +        analyzed
    +      } catch {
    +        case e: AnalysisException =>
    +          val ae = new AnalysisException(e.message, e.line, e.startPosition, Option(analyzed))
    +          ae.setStackTrace(e.getStackTrace)
    +          throw ae
    +      }
         }
       }
     
    -  override def execute(plan: LogicalPlan): LogicalPlan = {
    +  override def execute(plan: LogicalPlan, tracker: Option[QueryPlanningTracker]): LogicalPlan = {
    --- End diff --
    
    Do you intend to push this change into OSS? Looks like the signature of a few major entry points are changed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    cc @hvanhovell @gatorsmile 
    
    This is different from the existing metrics for rules as it is query specific. We might want to replace that one with this in the future.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99118/testReport)** for PR 23096 at commit [`34f8bfe`](https://github.com/apache/spark/commit/34f8bfe69b70ff702324ec7f38d78ae920410ef7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99118/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99065 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99065/testReport)** for PR 23096 at commit [`b6a3d02`](https://github.com/apache/spark/commit/b6a3d02f2c2b0eff71f92c3ede854edc3b5bf9f8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class RuleSummary(`
      * `class QueryPlanningTracker `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235144120
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -696,7 +701,7 @@ class Analyzer(
                   s"avoid errors. Increase the value of ${SQLConf.MAX_NESTED_VIEW_DEPTH.key} to work " +
                   "around this.")
               }
    -          executeSameContext(child)
    +          executeSameContext(child, None)
    --- End diff --
    
    Without knowing this code well it isn't obvious to me why sometimes we pass None as the tracker. What's the thinking behind it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235185654
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -648,7 +648,11 @@ class SparkSession private(
        * @since 2.0.0
        */
       def sql(sqlText: String): DataFrame = {
    -    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
    +    val tracker = new QueryPlanningTracker
    --- End diff --
    
    It's because this is a Nice-To-Have instead of Must-Have. The framework seems to be designed to support disabling by assigning `None` here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #4437 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4437/testReport)** for PR 23096 at commit [`2cd069c`](https://github.com/apache/spark/commit/2cd069c21808420c138ef8444dcb407454a447fd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235182105
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala ---
    @@ -88,15 +101,20 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging {
                 val startTime = System.nanoTime()
                 val result = rule(plan)
                 val runTime = System.nanoTime() - startTime
    +            val effective = !result.fastEquals(plan)
     
    -            if (!result.fastEquals(plan)) {
    +            if (effective) {
                   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
                   queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
                   planChangeLogger.log(rule.ruleName, plan, result)
                 }
                 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
                 queryExecutionMetrics.incNumExecution(rule.ruleName)
     
    +            if (tracker ne null) {
    --- End diff --
    
    if one calls execute directly tracker would be null.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99067/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235505190
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -648,7 +648,11 @@ class SparkSession private(
        * @since 2.0.0
        */
       def sql(sqlText: String): DataFrame = {
    -    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
    +    val tracker = new QueryPlanningTracker
    --- End diff --
    
    Thanks. I got it, @rxin .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99065/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99069/testReport)** for PR 23096 at commit [`dd61273`](https://github.com/apache/spark/commit/dd61273024bf322bfe03f3cd5e06803eaae62d8f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235150178
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.util.BoundedPriorityQueue
    +
    +
    +/**
    + * A simple utility for tracking runtime and associated stats in query planning.
    + *
    + * There are two separate concepts we track:
    + *
    + * 1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis,
    + * optimizationm and physical planning (just planning).
    + *
    + * 2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also
    + * track the number of invocations and effective invocations.
    + */
    +object QueryPlanningTracker {
    +
    +  // Define a list of common phases here.
    +  val PARSING = "parsing"
    --- End diff --
    
    Why not enum?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #4437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4437/testReport)** for PR 23096 at commit [`2cd069c`](https://github.com/apache/spark/commit/2cd069c21808420c138ef8444dcb407454a447fd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99080/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99080/testReport)** for PR 23096 at commit [`f36a231`](https://github.com/apache/spark/commit/f36a23170f5cc93724908e42041dad4672564353).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5224/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235162047
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala ---
    @@ -88,15 +92,18 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging {
                 val startTime = System.nanoTime()
                 val result = rule(plan)
                 val runTime = System.nanoTime() - startTime
    +            val effective = !result.fastEquals(plan)
     
    -            if (!result.fastEquals(plan)) {
    +            if (effective) {
                   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
                   queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
                   planChangeLogger.log(rule.ruleName, plan, result)
                 }
                 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
                 queryExecutionMetrics.incNumExecution(rule.ruleName)
     
    +            tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective))
    --- End diff --
    
    yes! (not great -- but I'd probably remove the global tracker at some point)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5188/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99067/testReport)** for PR 23096 at commit [`b2a2a01`](https://github.com/apache/spark/commit/b2a2a01e0607fe79e8ff6fa039555f8f1ccb3248).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99070/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5186/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merging this. Feel free to leave more comments. I'm hoping we can wire this into the UI eventually.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99067/testReport)** for PR 23096 at commit [`b2a2a01`](https://github.com/apache/spark/commit/b2a2a01e0607fe79e8ff6fa039555f8f1ccb3248).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class QueryExecution(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235161336
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.util.BoundedPriorityQueue
    +
    +
    +/**
    + * A simple utility for tracking runtime and associated stats in query planning.
    + *
    + * There are two separate concepts we track:
    + *
    + * 1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis,
    + * optimizationm and physical planning (just planning).
    + *
    + * 2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also
    + * track the number of invocations and effective invocations.
    + */
    +object QueryPlanningTracker {
    +
    +  // Define a list of common phases here.
    +  val PARSING = "parsing"
    --- End diff --
    
    Mostly because Scala enum is not great, and I was thinking about making this a generic thing that's extensible.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235309483
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -648,7 +648,11 @@ class SparkSession private(
        * @since 2.0.0
        */
       def sql(sqlText: String): DataFrame = {
    -    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
    +    val tracker = new QueryPlanningTracker
    --- End diff --
    
    I don't think it makes sense to add random flags for everything. If the argument is that this change has a decent chance of introducing regressions (e.g. due to higher memory usage, or cpu overhead), then it would make a lot of sense to put it behind a flag so it can be disabled in production if that happens.
    
    That said, the overhead on the hot code path here is substantially smaller than even transforming the simplest Catalyst plan (hash map look up is orders of magnitude cheaper than calling a partial function to transform a Scala collection for TreeNode), so I think the risk is so low that it does not warrant adding a config.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235159238
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -648,7 +648,11 @@ class SparkSession private(
        * @since 2.0.0
        */
       def sql(sqlText: String): DataFrame = {
    -    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
    +    val tracker = new QueryPlanningTracker
    --- End diff --
    
    @dongjoon-hyun just out of curiosity, what would you like to disable here? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99081/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235135213
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/QueryPlanningTracker.scala ---
    @@ -0,0 +1,109 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.util.BoundedPriorityQueue
    +
    +
    +/**
    + * A simple utility for tracking runtime and associated stats in query planning.
    + *
    + * There are two separate concepts we track:
    + *
    + * 1. Phases: These are broad scope phases in query planning, as listed below, i.e. analysis,
    + * optimizationm and physical planning (just planning).
    + *
    + * 2. Rules: These are the individual Catalyst rules that we track. In addition to time, we also
    + * track the number of invocations and effective invocations.
    + */
    +object QueryPlanningTracker {
    +
    +  // Define a list of common phases here.
    +  val PARSING = "parsing"
    +  val ANALYSIS = "analysis"
    +  val OPTIMIZATION = "optimization"
    +  val PLANNING = "planning"
    +
    +  class RuleSummary(
    +    var totalTimeNs: Long, var numInvocations: Long, var numEffectiveInvocations: Long) {
    +
    +    def this() = this(totalTimeNs = 0, numInvocations = 0, numEffectiveInvocations = 0)
    +
    +    override def toString: String = {
    +      s"RuleSummary($totalTimeNs, $numInvocations, $numEffectiveInvocations)"
    +    }
    +  }
    +}
    +
    +
    +class QueryPlanningTracker {
    +
    +  import QueryPlanningTracker._
    +
    +  // Mapping from the name of a rule to a rule's summary.
    +  // Use a Java HashMap for less overhead.
    +  private val rulesMap = new java.util.HashMap[String, RuleSummary]
    +
    +  // From a phase to time in ns.
    +  private val phaseToTimeNs = new java.util.HashMap[String, Long]
    +
    +  /** Measure the runtime of function f, and add it to the time for the specified phase. */
    +  def measureTime[T](phase: String)(f: => T): T = {
    +    val startTime = System.nanoTime()
    +    val ret = f
    +    val timeTaken = System.nanoTime() - startTime
    +    phaseToTimeNs.put(phase, phaseToTimeNs.getOrDefault(phase, 0) + timeTaken)
    +    ret
    +  }
    +
    +  /**
    +   * Reecord a specific invocation of a rule.
    --- End diff --
    
    typo: Record


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99081/testReport)** for PR 23096 at commit [`2cd069c`](https://github.com/apache/spark/commit/2cd069c21808420c138ef8444dcb407454a447fd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5189/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99069/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235112175
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ---
    @@ -648,7 +648,11 @@ class SparkSession private(
        * @since 2.0.0
        */
       def sql(sqlText: String): DataFrame = {
    -    Dataset.ofRows(self, sessionState.sqlParser.parsePlan(sqlText))
    +    val tracker = new QueryPlanningTracker
    --- End diff --
    
    Hi, @rxin .
    Can we have an option to disable this new feature?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23096: [SPARK-26129][SQL] Instrumentation for per-query ...

Posted by abehm <gi...@git.apache.org>.
Github user abehm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23096#discussion_r235166000
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala ---
    @@ -88,15 +92,18 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging {
                 val startTime = System.nanoTime()
                 val result = rule(plan)
                 val runTime = System.nanoTime() - startTime
    +            val effective = !result.fastEquals(plan)
     
    -            if (!result.fastEquals(plan)) {
    +            if (effective) {
                   queryExecutionMetrics.incNumEffectiveExecution(rule.ruleName)
                   queryExecutionMetrics.incTimeEffectiveExecutionBy(rule.ruleName, runTime)
                   planChangeLogger.log(rule.ruleName, plan, result)
                 }
                 queryExecutionMetrics.incExecutionTimeBy(rule.ruleName, runTime)
                 queryExecutionMetrics.incNumExecution(rule.ruleName)
     
    +            tracker.foreach(_.recordRuleInvocation(rule.ruleName, runTime, effective))
    --- End diff --
    
    removing the global tracker would be great!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99080/testReport)** for PR 23096 at commit [`f36a231`](https://github.com/apache/spark/commit/f36a23170f5cc93724908e42041dad4672564353).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99118/testReport)** for PR 23096 at commit [`34f8bfe`](https://github.com/apache/spark/commit/34f8bfe69b70ff702324ec7f38d78ae920410ef7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23096: [SPARK-26129][SQL] Instrumentation for per-query plannin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23096
  
    **[Test build #99069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99069/testReport)** for PR 23096 at commit [`dd61273`](https://github.com/apache/spark/commit/dd61273024bf322bfe03f3cd5e06803eaae62d8f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org