You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/06/11 21:42:58 UTC

[GitHub] spark pull request #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduc...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/21532

    [SPARK-24524][SQL]Improve aggregateMetrics: reduce memory usage and number of loops

    ## What changes were proposed in this pull request?
    
    The function `aggregateMetrics` process metrics from both executors and driver. The data can be large. 
    
    This PR is to improve the implementation with one loop(before converting to string) and one dynamic data structure.
    
    
    ## How was this patch tested?
    
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark aggMetrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21532
    
----
commit 0ce71c09bf5593c16e0eff5ae6e4aa3bd4c6ca26
Author: Gengliang Wang <ge...@...>
Date:   2018-06-11T21:32:11Z

    Improve aggregateMetrics with less memory usage and loops

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Find possible issue, close this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    @vanzin @felixcheung @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduc...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21532#discussion_r194558798
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,19 +159,29 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
         val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
    -    val metrics = exec.stages.toSeq
    -      .flatMap { stageId => Option(stageMetrics.get(stageId)) }
    -      .flatMap(_.taskMetrics.values().asScala)
    -      .flatMap { metrics => metrics.ids.zip(metrics.values) }
    -
    -    val aggregatedMetrics = (metrics ++ exec.driverAccumUpdates.toSeq)
    -      .filter { case (id, _) => metricIds.contains(id) }
    -      .groupBy(_._1)
    -      .map { case (id, values) =>
    -        id -> SQLMetrics.stringValue(metricTypes(id), values.map(_._2).toSeq)
    +    val metrics = metricTypes.keys
    +      .map { id => (id, scala.collection.mutable.ArrayBuffer.empty[Long]) }
    +      .toMap
    +    stageMetrics.asScala.collect { case (stage, liveStageMetrics) if exec.stages.contains(stage) =>
    +      liveStageMetrics.taskMetrics.values().asScala.foreach { case liveMetrics =>
    +        var i = 0
    +        while (i < liveMetrics.ids.length) {
    --- End diff --
    
    Use `while` for critical loop path: https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduc...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang closed the pull request at:

    https://github.com/apache/spark/pull/21532


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    **[Test build #91676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91676/testReport)** for PR 21532 at commit [`f58b944`](https://github.com/apache/spark/commit/f58b94411d6564d66338f97b9e753cd3267dd0cf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by gengliangwang <gi...@git.apache.org>.

Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    This PR is inspired with #21438 .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91676/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/22/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    **[Test build #91676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91676/testReport)** for PR 21532 at commit [`f58b944`](https://github.com/apache/spark/commit/f58b94411d6564d66338f97b9e753cd3267dd0cf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21532
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3912/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org