You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/11/10 16:07:52 UTC
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/23002
[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance
## What changes were proposed in this pull request?
In `SQLAppStatusListener.aggregateMetrics`, we use the `metricIds` only to filter the relevant metrics. And this is a Seq which is also sorted. When there are many metrics involved, this can be pretty inefficient. The PR proposes to use a Set for it.
## How was this patch tested?
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mgaido91/spark SPARK-26003
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23002.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23002
----
commit 7e790412ed6409fdda96216dde7f4f408bb04a57
Author: Marco Gaido <ma...@...>
Date: 2018-11-10T16:04:25Z
[SPARK-26003] Improve SQLAppStatusListener.aggregateMetrics performance
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/23002#discussion_r232713853
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
}
private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
- val metricIds = exec.metrics.map(_.accumulatorId).sorted
+ val metricIds = exec.metrics.map(_.accumulatorId).toSet
val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
val metrics = exec.stages.toSeq
.flatMap { stageId => Option(stageMetrics.get(stageId)) }
--- End diff --
I am also fine with the current code here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23002
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98683/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/23002#discussion_r232706992
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
}
private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
- val metricIds = exec.metrics.map(_.accumulatorId).sorted
+ val metricIds = exec.metrics.map(_.accumulatorId).toSet
val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
val metrics = exec.stages.toSeq
.flatMap { stageId => Option(stageMetrics.get(stageId)) }
--- End diff --
Consider also change the following `flatMap` / `filter` / `groupBy` into `while` loop
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/23002#discussion_r232706543
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
}
private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
- val metricIds = exec.metrics.map(_.accumulatorId).sorted
+ val metricIds = exec.metrics.map(_.accumulatorId).toSet
--- End diff --
Actually this one can be merged into `metricTypes`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23002
**[Test build #98683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98683/testReport)** for PR 23002 at commit [`7e79041`](https://github.com/apache/spark/commit/7e790412ed6409fdda96216dde7f4f408bb04a57).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/23002#discussion_r232713761
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
}
private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
- val metricIds = exec.metrics.map(_.accumulatorId).sorted
+ val metricIds = exec.metrics.map(_.accumulatorId).toSet
val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
val metrics = exec.stages.toSeq
.flatMap { stageId => Option(stageMetrics.get(stageId)) }
--- End diff --
If the metrics is large, then using a while loop can reduce the number of traversal loops. And it is not complicated to do it in the code here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23002
**[Test build #98730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98730/testReport)** for PR 23002 at commit [`031d512`](https://github.com/apache/spark/commit/031d512b84e0b84a1876c098e0842f13d37c38e8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4915/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/23002
LGTM, also cc @gengliangwang
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23002#discussion_r232715666
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
}
private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
- val metricIds = exec.metrics.map(_.accumulatorId).sorted
+ val metricIds = exec.metrics.map(_.accumulatorId).toSet
val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
val metrics = exec.stages.toSeq
.flatMap { stageId => Option(stageMetrics.get(stageId)) }
--- End diff --
yes, we can save 1 traversal, but I am not sure it is worth honestly... This approach seems cleaner to me.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23002
**[Test build #98683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98683/testReport)** for PR 23002 at commit [`7e79041`](https://github.com/apache/spark/commit/7e790412ed6409fdda96216dde7f4f408bb04a57).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/23002
cc @cloud-fan @vanzin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4951/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/23002
**[Test build #98730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98730/testReport)** for PR 23002 at commit [`031d512`](https://github.com/apache/spark/commit/031d512b84e0b84a1876c098e0842f13d37c38e8).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98730/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/23002
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23002: [SPARK-26003] Improve SQLAppStatusListener.aggregateMetr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23002
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23002: [SPARK-26003] Improve SQLAppStatusListener.aggreg...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23002#discussion_r232712094
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ---
@@ -159,7 +159,7 @@ class SQLAppStatusListener(
}
private def aggregateMetrics(exec: LiveExecutionData): Map[Long, String] = {
- val metricIds = exec.metrics.map(_.accumulatorId).sorted
+ val metricIds = exec.metrics.map(_.accumulatorId).toSet
val metricTypes = exec.metrics.map { m => (m.accumulatorId, m.metricType) }.toMap
val metrics = exec.stages.toSeq
.flatMap { stageId => Option(stageMetrics.get(stageId)) }
--- End diff --
not sure what you mean here. Why should we use a `while` loop?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org