You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2015/10/22 06:01:08 UTC

[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/9215

    [SPARK-11253][SQL] reset all accumulators in physical operators before execute an action

    With this change, our query execution listener can get the metrics correctly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark metric

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9215.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9215
    
----
commit 4589e6668679b5bc642ea297cde403a18a690ced
Author: Wenchen Fan <we...@databricks.com>
Date:   2015-10-22T03:59:39Z

    reset all accumulators in physical operators before execute an action

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151013028
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150106101
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151013772
  
    **[Test build #44332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44332/consoleFull)** for PR 9215 at commit [`6923c4f`](https://github.com/apache/spark/commit/6923c4f9f9ca63a491d80596cbc6ccd71d8f5f25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150119571
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150291397
  
    Can you also double check if our ui looks good after this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150392446
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150673333
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44237/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42930194
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala ---
    @@ -62,7 +67,19 @@ private[sql] trait SQLMetricValue[T] extends Serializable {
     private[sql] class LongSQLMetricValue(private var _value : Long) extends SQLMetricValue[Long] {
     
       def add(incr: Long): LongSQLMetricValue = {
    -    _value += incr
    +    // Some LongSQLMetric will use -1 as initial value, so if the accumulator is never updated,
    +    // we can filter it out later.  However, when `add` is called, the accumulator is valid, we
    +    // should turn -1 to 0.
    +    if (_value < 0) {
    +      _value = 0
    +    }
    +
    +    // Some LongSQLMetric will use -1 as initial value, when we merge accumulator updates at driver
    +    // side, we should ignore these -1 values.
    +    if (incr > 0) {
    +      _value += incr
    +    }
    +
    --- End diff --
    
    I guess it is probably not worth that. The impact of those `-1` just too small (1048576 tasks for 1 MB). I think we can do that later. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150414417
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150941908
  
    **[Test build #44319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44319/consoleFull)** for PR 9215 at commit [`b088c70`](https://github.com/apache/spark/commit/b088c70f4bc2a1a5edd3388241803413bb23a2fc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150106420
  
    **[Test build #44129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44129/consoleFull)** for PR 9215 at commit [`6397cf5`](https://github.com/apache/spark/commit/6397cf57e80b6de6f039a6954e264bc214b6a2c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150414133
  
    **[Test build #44186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44186/consoleFull)** for PR 9215 at commit [`778992e`](https://github.com/apache/spark/commit/778992e23d309f00869b80fec2a08413bb624e84).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151031307
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150126796
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44129/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150930195
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42775206
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala ---
    @@ -80,4 +80,24 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext {
         assert(metrics(0)._2.analyzed.isInstanceOf[Project])
         assert(metrics(0)._3.getMessage == e.getMessage)
       }
    +
    +  test("get metrics by callback") {
    +    val metrics = ArrayBuffer.empty[Long]
    +    val listener = new QueryExecutionListener {
    +      // Only test successful case here, so no need to implement `onFailure`
    +      override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
    +
    +      override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = {
    +        metrics += qe.executedPlan.longMetric("numInputRows").value.value
    +      }
    +    }
    +    sqlContext.listenerManager.register(listener)
    +
    +    Seq(1 -> "a").toDF("i", "j").groupBy("i").count().collect()
    +    Seq(1 -> "a", 2 -> "a").toDF("i", "j").groupBy("i").count().collect()
    +
    +    assert(metrics.length == 2)
    +    assert(metrics(0) == 1)
    +    assert(metrics(1) == 2)
    +  }
    --- End diff --
    
    Can we run the same plan physical plan multiple times to make sure metrics are good?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150930124
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150941936
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44319/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42710749
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala ---
    @@ -28,7 +28,12 @@ import org.apache.spark.{Accumulable, AccumulableParam, SparkContext}
      */
     private[sql] abstract class SQLMetric[R <: SQLMetricValue[T], T](
         name: String, val param: SQLMetricParam[R, T])
    -  extends Accumulable[R, T](param.zero, param, Some(name), true)
    +  extends Accumulable[R, T](param.zero, param, Some(name), true) {
    +
    +  def reset: Unit = {
    --- End diff --
    
    () since it has side effect


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150126583
  
    **[Test build #44129 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44129/consoleFull)** for PR 9215 at commit [`6397cf5`](https://github.com/apache/spark/commit/6397cf57e80b6de6f039a6954e264bc214b6a2c3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151029277
  
    **[Test build #44332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44332/consoleFull)** for PR 9215 at commit [`6923c4f`](https://github.com/apache/spark/commit/6923c4f9f9ca63a491d80596cbc6ccd71d8f5f25).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150941933
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151029349
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150100164
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150119572
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44126/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150126795
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150392433
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150640729
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42892965
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala ---
    @@ -62,7 +67,19 @@ private[sql] trait SQLMetricValue[T] extends Serializable {
     private[sql] class LongSQLMetricValue(private var _value : Long) extends SQLMetricValue[Long] {
     
       def add(incr: Long): LongSQLMetricValue = {
    -    _value += incr
    +    // Some LongSQLMetric will use -1 as initial value, so if the accumulator is never updated,
    +    // we can filter it out later.  However, when `add` is called, the accumulator is valid, we
    +    // should turn -1 to 0.
    +    if (_value < 0) {
    +      _value = 0
    +    }
    +
    +    // Some LongSQLMetric will use -1 as initial value, when we merge accumulator updates at driver
    +    // side, we should ignore these -1 values.
    +    if (incr > 0) {
    +      _value += incr
    +    }
    +
    --- End diff --
    
    @cloud-fan sorry. I just realized that this method is in the critical path (when we calculate numRows). How about we remove this change and document it clear that those negative initial values will have a small impact on the sum of memory consumption?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150100797
  
    **[Test build #44126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44126/consoleFull)** for PR 9215 at commit [`4589e66`](https://github.com/apache/spark/commit/4589e6668679b5bc642ea297cde403a18a690ced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42929484
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala ---
    @@ -62,7 +67,19 @@ private[sql] trait SQLMetricValue[T] extends Serializable {
     private[sql] class LongSQLMetricValue(private var _value : Long) extends SQLMetricValue[Long] {
     
       def add(incr: Long): LongSQLMetricValue = {
    -    _value += incr
    +    // Some LongSQLMetric will use -1 as initial value, so if the accumulator is never updated,
    +    // we can filter it out later.  However, when `add` is called, the accumulator is valid, we
    +    // should turn -1 to 0.
    +    if (_value < 0) {
    +      _value = 0
    +    }
    +
    +    // Some LongSQLMetric will use -1 as initial value, when we merge accumulator updates at driver
    +    // side, we should ignore these -1 values.
    +    if (incr > 0) {
    +      _value += incr
    +    }
    +
    --- End diff --
    
    I was trying to reuse current codebase and didn't create a new SQLMetric(including new MetricValue, MetricParam, etc.). Is it worth to create a new one so that we won't hurt performance for numRows?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150119505
  
    **[Test build #44126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44126/consoleFull)** for PR 9215 at commit [`4589e66`](https://github.com/apache/spark/commit/4589e6668679b5bc642ea297cde403a18a690ced).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151029350
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44332/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150673263
  
    **[Test build #44237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44237/consoleFull)** for PR 9215 at commit [`4ff8912`](https://github.com/apache/spark/commit/4ff891205979a06abbf229f115bbbf99bda3ba1f).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150643054
  
    **[Test build #44237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44237/consoleFull)** for PR 9215 at commit [`4ff8912`](https://github.com/apache/spark/commit/4ff891205979a06abbf229f115bbbf99bda3ba1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150100155
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151013026
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151041137
  
    should this merged to branch-1.5?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150640759
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150931394
  
    **[Test build #44319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44319/consoleFull)** for PR 9215 at commit [`b088c70`](https://github.com/apache/spark/commit/b088c70f4bc2a1a5edd3388241803413bb23a2fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9215


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150414422
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44186/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42957236
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala ---
    @@ -80,4 +80,71 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext {
         assert(metrics(0)._2.analyzed.isInstanceOf[Project])
         assert(metrics(0)._3.getMessage == e.getMessage)
       }
    +
    +  test("get numRows metrics by callback") {
    +    val metrics = ArrayBuffer.empty[Long]
    +    val listener = new QueryExecutionListener {
    +      // Only test successful case here, so no need to implement `onFailure`
    +      override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
    +
    +      override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = {
    +        metrics += qe.executedPlan.longMetric("numInputRows").value.value
    +      }
    +    }
    +    sqlContext.listenerManager.register(listener)
    +
    +    val df = Seq(1 -> "a").toDF("i", "j").groupBy("i").count()
    +    df.collect()
    +    df.collect()
    +    Seq(1 -> "a", 2 -> "a").toDF("i", "j").groupBy("i").count().collect()
    +
    +    assert(metrics.length == 3)
    +    assert(metrics(0) == 1)
    +    assert(metrics(1) == 1)
    +    assert(metrics(2) == 2)
    +  }
    +
    +  // TODO: Currently some LongSQLMetric use -1 as initial value, so if the accumulator is never
    +  // updated, we can filter it out later.  However, when we aggregate(sum) accumulator values at
    +  // driver side for SQL physical operators, these -1 values will make our result smaller.
    +  // A easy fix is to create a new SQLMetric(including new MetricValue, MetricParam, etc.), but we
    +  // can do it later because the impact is just too small (1048576 tasks for 1 MB).
    +  ignore("get size metrics by callback") {
    +    val metrics = ArrayBuffer.empty[Long]
    +    val listener = new QueryExecutionListener {
    +      // Only test successful case here, so no need to implement `onFailure`
    +      override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
    +
    +      override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = {
    +        metrics += qe.executedPlan.longMetric("dataSize").value.value
    +        val bottomAgg = qe.executedPlan.children(0).children(0)
    +        metrics += bottomAgg.longMetric("dataSize").value.value
    +      }
    +    }
    +    sqlContext.listenerManager.register(listener)
    +
    +    val sparkListener = new SaveInfoListener
    +    sqlContext.sparkContext.addSparkListener(sparkListener)
    +
    +    val df = (1 to 100).map(i => i -> i.toString).toDF("i", "j")
    +    df.groupBy("i").count().collect()
    +
    +    def getPeakExecutionMemory(stageId: Int): Long = {
    +      val peakMemoryAccumulator = sparkListener.getCompletedStageInfos(stageId).accumulables
    +        .filter(_._2.name == InternalAccumulator.PEAK_EXECUTION_MEMORY)
    +
    +      assert(peakMemoryAccumulator.size == 1)
    +      peakMemoryAccumulator.head._2.value.toLong
    +    }
    +
    +    assert(sparkListener.getCompletedStageInfos.length == 2)
    +    val bottomAggDataSize = getPeakExecutionMemory(0)
    +    val topAggDataSize = getPeakExecutionMemory(1)
    +
    +    // For this simple case, the peakExecutionMemory of a stage should be the data size of the
    +    // aggregate operator, as we only have one memory consuming operator per stage.
    +    assert(metrics.length == 2)
    +    assert(metrics(0) == topAggDataSize)
    +    assert(metrics(1) == bottomAggDataSize)
    +  }
    --- End diff --
    
    good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-151259944
  
    @scwf It will not be merged to branch-1.5 because our UI handles those metric updates correctly. This problem was exposed after we call the `QueryExecutionListener` logic (see #9078 ).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9215#discussion_r42952784
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala ---
    @@ -80,4 +80,71 @@ class DataFrameCallbackSuite extends QueryTest with SharedSQLContext {
         assert(metrics(0)._2.analyzed.isInstanceOf[Project])
         assert(metrics(0)._3.getMessage == e.getMessage)
       }
    +
    +  test("get numRows metrics by callback") {
    +    val metrics = ArrayBuffer.empty[Long]
    +    val listener = new QueryExecutionListener {
    +      // Only test successful case here, so no need to implement `onFailure`
    +      override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
    +
    +      override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = {
    +        metrics += qe.executedPlan.longMetric("numInputRows").value.value
    +      }
    +    }
    +    sqlContext.listenerManager.register(listener)
    +
    +    val df = Seq(1 -> "a").toDF("i", "j").groupBy("i").count()
    +    df.collect()
    +    df.collect()
    +    Seq(1 -> "a", 2 -> "a").toDF("i", "j").groupBy("i").count().collect()
    +
    +    assert(metrics.length == 3)
    +    assert(metrics(0) == 1)
    +    assert(metrics(1) == 1)
    +    assert(metrics(2) == 2)
    +  }
    +
    +  // TODO: Currently some LongSQLMetric use -1 as initial value, so if the accumulator is never
    +  // updated, we can filter it out later.  However, when we aggregate(sum) accumulator values at
    +  // driver side for SQL physical operators, these -1 values will make our result smaller.
    +  // A easy fix is to create a new SQLMetric(including new MetricValue, MetricParam, etc.), but we
    +  // can do it later because the impact is just too small (1048576 tasks for 1 MB).
    +  ignore("get size metrics by callback") {
    +    val metrics = ArrayBuffer.empty[Long]
    +    val listener = new QueryExecutionListener {
    +      // Only test successful case here, so no need to implement `onFailure`
    +      override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {}
    +
    +      override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = {
    +        metrics += qe.executedPlan.longMetric("dataSize").value.value
    +        val bottomAgg = qe.executedPlan.children(0).children(0)
    +        metrics += bottomAgg.longMetric("dataSize").value.value
    +      }
    +    }
    +    sqlContext.listenerManager.register(listener)
    +
    +    val sparkListener = new SaveInfoListener
    +    sqlContext.sparkContext.addSparkListener(sparkListener)
    +
    +    val df = (1 to 100).map(i => i -> i.toString).toDF("i", "j")
    +    df.groupBy("i").count().collect()
    +
    +    def getPeakExecutionMemory(stageId: Int): Long = {
    +      val peakMemoryAccumulator = sparkListener.getCompletedStageInfos(stageId).accumulables
    +        .filter(_._2.name == InternalAccumulator.PEAK_EXECUTION_MEMORY)
    +
    +      assert(peakMemoryAccumulator.size == 1)
    +      peakMemoryAccumulator.head._2.value.toLong
    +    }
    +
    +    assert(sparkListener.getCompletedStageInfos.length == 2)
    +    val bottomAggDataSize = getPeakExecutionMemory(0)
    +    val topAggDataSize = getPeakExecutionMemory(1)
    +
    +    // For this simple case, the peakExecutionMemory of a stage should be the data size of the
    +    // aggregate operator, as we only have one memory consuming operator per stage.
    +    assert(metrics.length == 2)
    +    assert(metrics(0) == topAggDataSize)
    +    assert(metrics(1) == bottomAggDataSize)
    +  }
    --- End diff --
    
    Sorry. Just one last thing. I think we need to call `unregister` at the end of every test, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150392925
  
    **[Test build #44186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44186/consoleFull)** for PR 9215 at commit [`778992e`](https://github.com/apache/spark/commit/778992e23d309f00869b80fec2a08413bb624e84).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150673330
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11253][SQL] reset all accumulators in p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9215#issuecomment-150106110
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org