You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/11/10 16:07:52 UTC

[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/23002

    [SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance

    ## What changes were proposed in this pull request?
    
    In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.
    
    
    ## How was this patch tested?
    
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-26003

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23002.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23002
    
----
commit 7e790412ed6409fdda96216dde7f4f408bb04a57
Author: Marco Gaido <ma...@...>
Date:   2018-11-10T16:04:25Z

    [SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23002#discussion_r232713853
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,7 +159,7 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
    +    val metricIds = exec.metrics.map(_.accumulatorId).toSet
         val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
         val metrics = exec.stages.toSeq
           .flatMap { stageId => Option(stageMetrics.get(stageId)) }
    --- End diff --
    
    I am also fine with the current code here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98683/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23002#discussion_r232706992
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,7 +159,7 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
    +    val metricIds = exec.metrics.map(_.accumulatorId).toSet
         val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
         val metrics = exec.stages.toSeq
           .flatMap { stageId => Option(stageMetrics.get(stageId)) }
    --- End diff --
    
    Consider also change the following `flatMap` / `filter`  / `groupBy` into `while` loop


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23002#discussion_r232706543
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,7 +159,7 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
    +    val metricIds = exec.metrics.map(_.accumulatorId).toSet
    --- End diff --
    
    Actually this one can be merged into `metricTypes`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    **[Test build #98683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98683/testReport)** for PR 23002 at commit [`7e79041`](https://github.com/apache/spark/commit/7e790412ed6409fdda96216dde7f4f408bb04a57).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23002#discussion_r232713761
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,7 +159,7 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
    +    val metricIds = exec.metrics.map(_.accumulatorId).toSet
         val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
         val metrics = exec.stages.toSeq
           .flatMap { stageId => Option(stageMetrics.get(stageId)) }
    --- End diff --
    
    If the metrics is large, then using a while loop can reduce the number of traversal loops. And it is not complicated to do it in the code here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    **[Test build #98730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98730/testReport)** for PR 23002 at commit [`031d512`](https://github.com/apache/spark/commit/031d512b84e0b84a1876c098e0842f13d37c38e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4915/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    LGTM, also cc @gengliangwang 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23002#discussion_r232715666
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,7 +159,7 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
    +    val metricIds = exec.metrics.map(_.accumulatorId).toSet
         val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
         val metrics = exec.stages.toSeq
           .flatMap { stageId => Option(stageMetrics.get(stageId)) }
    --- End diff --
    
    yes, we can save 1 traversal, but I am not sure it is worth honestly... This approach seems cleaner to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    **[Test build #98683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98683/testReport)** for PR 23002 at commit [`7e79041`](https://github.com/apache/spark/commit/7e790412ed6409fdda96216dde7f4f408bb04a57).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    cc @cloud-fan @vanzin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4951/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    **[Test build #98730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98730/testReport)** for PR 23002 at commit [`031d512`](https://github.com/apache/spark/commit/031d512b84e0b84a1876c098e0842f13d37c38e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98730/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23002


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23002
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23002#discussion_r232712094
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
    @@ -159,7 +159,7 @@ class SQLAppStatusListener(
       }
     
       private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
    -    val metricIds = exec.metrics.map(_.accumulatorId).sorted
    +    val metricIds = exec.metrics.map(_.accumulatorId).toSet
         val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
         val metrics = exec.stages.toSeq
           .flatMap { stageId => Option(stageMetrics.get(stageId)) }
    --- End diff --
    
    not sure what you mean here. Why should we use a `while` loop?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org