You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/28 18:32:51 UTC

[GitHub] [spark] xkrogen opened a new pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

xkrogen opened a new pull request #32388:
URL: https://github.com/apache/spark/pull/32388


   ### What changes were proposed in this pull request?
   This adds two new additional metrics to `ExternalBlockHandler`:
   - `blockTransferRate` -- for indicating the rate of transferring blocks, vs. the data within them
   - `blockTransferAvgSize_1min` -- a 1-minute trailing average of block sizes transferred by the ESS
   
   Additionally, this enhances `YarnShuffleServiceMetrics` to expose the histogram/`Snapshot` information from `Timer` metrics within `ExternalBlockHandler`.
   
   ### Why are the changes needed?
   Currently `ExternalBlockHandler` exposes some useful metrics, but is lacking around metrics for the rate of block transfers. We have `blockTransferRateBytes` to tell us the rate of _bytes_, but no metric to tell us the rate of _blocks_, which is especially relevant when running the ESS on HDDs that are sensitive to random reads. Many small block transfers can have a negative impact on performance, but won't show up as a spike in `blockTransferRateBytes` since the sizes are small. Thus the new metrics to show information around average block size and block transfer rate are very useful to monitor the health/performance of the ESS, especially when running on HDDs. 
   
   For the `YarnShuffleServiceMetrics`, currently the three `Timer` metrics exposed by `ExternalBlockHandler` are being underutilized in a YARN-based environment -- they are basically treated as a `Meter`, only exposing rate-based information, when the metrics themselves are collected detailed histograms of timing information. We should expose this information for better observability.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, there are two entirely new metrics for the ESS, as documented in `monitoring.md`. Additionally in a YARN environment, `Timer` metrics exposed by the ESS will include more rich timing information.
   
   ### How was this patch tested?
   New unit tests are added to verify that new metrics are showing up as expected.
   
   We have been running this patch internally for approx. 1 year and have found it to be useful for monitoring the health of ESS and diagnosing performance issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844341643


   Rebased and reverted some unrelated changes, thanks to @otterc for the heads up there!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r659546665



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       I am assuming there are no further suggestions to resolve here ... we can do a follow up in case there is a change in behavior required for 3.2
   +CC @dongjoon-hyun 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828873465


   cc @Ngone51 too FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865174201


   **[Test build #140083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140083/testReport)** for PR 32388 at commit [`16f53dc`](https://github.com/apache/spark/commit/16f53dc0e88927d0a80f6d87f0311cce90d4ef9a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865183826


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839131504


   **[Test build #138402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138402/testReport)** for PR 32388 at commit [`42e48e5`](https://github.com/apache/spark/commit/42e48e58a5c909858204b0614d68a991bb46e53d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r642165193



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,32 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentileThousands : new int[] { 10, 50, 250, 500, 750, 950, 980, 990, 999 }) {
+        String percentileStr;
+        switch (percentileThousands) {
+          case 10:
+            percentileStr = "1stPercentile";
+            break;
+          case 999:
+            percentileStr = "99.9thPercentile";

Review comment:
       Hi, @xkrogen . Thank you for adding this and I saw your [comment](https://github.com/apache/spark/pull/32388#discussion_r629507805).
   
   Technically, `999thPercentile` is consistent with the Apache Spark convention instead of `99.9thPercentitle`. If you look at the other existing metrics, it has the following.
   <img width="146" alt="Screen Shot 2021-05-30 at 6 52 14 PM" src="https://user-images.githubusercontent.com/9700541/120129072-325b1b80-c178-11eb-9ff2-dd242951e24c.png">
   
   
   And, just a question. May I ask where you borrow this `99.9thPercentitle`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829496063


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42609/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r623209427



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -98,10 +99,10 @@ public ExternalBlockHandler(
       OneForOneStreamManager streamManager,
       ExternalShuffleBlockResolver blockManager,
       MergedShuffleFileManager mergeManager) {
-    this.metrics = new ShuffleMetrics();
     this.streamManager = streamManager;
     this.blockManager = blockManager;
     this.mergeManager = mergeManager;
+    this.metrics = new ShuffleMetrics();

Review comment:
       Great question! Changing from:
   ```
         allMetrics.put("registeredExecutorsSize",
                        (Gauge<Integer>) () -> blockManager.getRegisteredExecutorsSize());
   ```
   to
   ```
         allMetrics.put("registeredExecutorsSize",
             (Gauge<Integer>) blockManager::getRegisteredExecutorsSize);
   ```
   Means that the `blockManager` object is evaluated at the time of `new ShuffleMetrics()`, instead of later at the time the metrics are gathered. When the instantiation is the first line, `blockManager` is still null. If it's at the end, `blockManager` has been initialized by then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r628798552



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,17 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(getMetricsInfoGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(getMetricsInfoGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(getMetricsInfoGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(getMetricsInfoGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentile : new int[] { 1, 5, 25, 50, 75, 95, 99 }) {

Review comment:
       Is there a reason of this selection? In general, we use `Snapshot.get98thPercentile` and `Snapshot.get999thPercentile` additionally too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-837178915


   **[Test build #138340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138340/testReport)** for PR 32388 at commit [`d7eebfd`](https://github.com/apache/spark/commit/d7eebfd6244123bd765d75307bc026111948b0d3).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832328008


   **[Test build #138159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138159/testReport)** for PR 32388 at commit [`d519fb9`](https://github.com/apache/spark/commit/d519fb9e44068e0f26f59fc4cc9e19c43c863f7a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829579374


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138089/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844462770


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852287246


   **[Test build #139176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139176/testReport)** for PR 32388 at commit [`d01b392`](https://github.com/apache/spark/commit/d01b3921f8622ce3d8305c18fedd2f8895a2c7c0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-836992973


   **[Test build #138340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138340/testReport)** for PR 32388 at commit [`d7eebfd`](https://github.com/apache/spark/commit/d7eebfd6244123bd765d75307bc026111948b0d3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844344276


   **[Test build #138715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138715/testReport)** for PR 32388 at commit [`e63d7ef`](https://github.com/apache/spark/commit/e63d7efa9d280bccda980004ad2e154845ea4acb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r642167700



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Is this valid when we do `getContinuousBlocksData`? To be clear, could you revise the definition you are aiming?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844344276


   **[Test build #138715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138715/testReport)** for PR 32388 at commit [`e63d7ef`](https://github.com/apache/spark/commit/e63d7efa9d280bccda980004ad2e154845ea4acb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839131504


   **[Test build #138402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138402/testReport)** for PR 32388 at commit [`42e48e5`](https://github.com/apache/spark/commit/42e48e58a5c909858204b0614d68a991bb46e53d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865136643


   Rebased on `master` to resolve some conflicts with other shuffle metrics related changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865286342


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140083/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r642167700



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Is this valid when we do `getContinuousBlocksData`? To be clear to the metric audience, could you revise the definition you are aiming?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852291325


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139176/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852368921


   **[Test build #139180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139180/testReport)** for PR 32388 at commit [`493f1d5`](https://github.com/apache/spark/commit/493f1d5d398000f6c476e77ccfa65286f713c78d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-834552862


   Sorry my comment was for #32468 .. misposted it @otterc :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829461250


   **[Test build #138089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138089/testReport)** for PR 32388 at commit [`08b746a`](https://github.com/apache/spark/commit/08b746ab97345f023fe112f4fa1d196f072958ae).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r639019476



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,32 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentileThousands : new int[] { 10, 50, 250, 500, 750, 950, 980, 990, 999 }) {
+        String percentileStr;
+        switch (percentileThousands) {
+          case 10:
+            percentileStr = "1stPercentile";
+            break;
+          case 999:
+            percentileStr = "99.9thPercentile";
+            break;
+          default:
+            percentileStr = String.format("%d%sPercentile", percentileThousands / 10, "th");

Review comment:
       nit: move 'th' into the format ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865183802


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jaceklaskowski commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
jaceklaskowski commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r622875712



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -98,10 +99,10 @@ public ExternalBlockHandler(
       OneForOneStreamManager streamManager,
       ExternalShuffleBlockResolver blockManager,
       MergedShuffleFileManager mergeManager) {
-    this.metrics = new ShuffleMetrics();
     this.streamManager = streamManager;
     this.blockManager = blockManager;
     this.mergeManager = mergeManager;
+    this.metrics = new ShuffleMetrics();

Review comment:
       Why has this been moved down? Curious.

##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -291,9 +294,18 @@ public ShuffleMetrics() {
       allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatencyMillis);
       allMetrics.put("registerExecutorRequestLatencyMillis", registerExecutorRequestLatencyMillis);
       allMetrics.put("finalizeShuffleMergeLatencyMillis", finalizeShuffleMergeLatencyMillis);
+      allMetrics.put("blockTransferRate", blockTransferRate);
       allMetrics.put("blockTransferRateBytes", blockTransferRateBytes);
+      allMetrics.put("blockTransferAvgSize_1min", new RatioGauge() {
+        @Override
+        protected Ratio getRatio() {
+          return Ratio.of(
+              blockTransferRateBytes.getOneMinuteRate(),
+              blockTransferRate.getOneMinuteRate());
+        }
+      });
       allMetrics.put("registeredExecutorsSize",
-                     (Gauge<Integer>) () -> blockManager.getRegisteredExecutorsSize());
+          (Gauge<Integer>) blockManager::getRegisteredExecutorsSize);

Review comment:
       Been a while since I was with Java, and I'm curious whether the change is going to work with Java 8 (that I assume Spark 3.x keeps supporting)?

##########
File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
##########
@@ -403,7 +403,9 @@ class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with BeforeAnd
     val metrics = metricSetRef.get(metricsSource).asInstanceOf[MetricSet].getMetrics
 
     assert(metrics.keySet().asScala == Set(
+      "blockTransferRate",

Review comment:
       Why not to use the same trick "// Use sorted Seq instead of Set for easier comparison when there is a mismatch" here?

##########
File path: docs/monitoring.md
##########
@@ -1361,7 +1361,9 @@ Note: applies when running in Spark standalone as worker
 ### Component instance = shuffleService
 Note: applies to the shuffle service
 
+- blockTransferRate (meter)
 - blockTransferRateBytes (meter)
+- blockTransferAvgTime_1min (gauge - 1 minute moving average)

Review comment:
       nit: 1-minute (with the dash)

##########
File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
##########
@@ -37,28 +38,57 @@ class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
   val metrics = new ExternalBlockHandler(streamManager, blockResolver).getAllMetrics
 
   test("metrics named as expected") {
-    val allMetrics = Set(
+    val allMetrics = Seq(

Review comment:
       Why `Seq` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829461250


   **[Test build #138089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138089/testReport)** for PR 32388 at commit [`08b746a`](https://github.com/apache/spark/commit/08b746ab97345f023fe112f4fa1d196f072958ae).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828759848


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138051/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r623211422



##########
File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
##########
@@ -37,28 +38,57 @@ class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
   val metrics = new ExternalBlockHandler(streamManager, blockResolver).getAllMetrics
 
   test("metrics named as expected") {
-    val allMetrics = Set(
+    val allMetrics = Seq(

Review comment:
       I think the comment below explains. You can't sort a `Set` so if we use `Set` here we have to call `toSeq.sorted` -- we might as well just use `Seq` to begin with.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844402825


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43237/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839274118


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138402/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r642167700



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Is this valid when we do `getContinuousBlocksData`?

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,32 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentileThousands : new int[] { 10, 50, 250, 500, 750, 950, 980, 990, 999 }) {
+        String percentileStr;
+        switch (percentileThousands) {
+          case 10:
+            percentileStr = "1stPercentile";
+            break;
+          case 999:
+            percentileStr = "99.9thPercentile";

Review comment:
       Hi, @xkrogen . Thank you for adding this and I saw your [comment](https://github.com/apache/spark/pull/32388#discussion_r629507805).
   
   Technically, `999thPercentile` is consistent with the conventional in Apache Spark instead of `99.9thPercentitle`. If you look at the other existing metrics, it has the following.
   <img width="146" alt="Screen Shot 2021-05-30 at 6 52 14 PM" src="https://user-images.githubusercontent.com/9700541/120129072-325b1b80-c178-11eb-9ff2-dd242951e24c.png">
   
   
   And, just a question. May I ask where you borrow this `99.9thPercentitle`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832392585






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852474620


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139180/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865183826






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-834552862


   Sorry my comment was for #32438 .. misposted it @otterc :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852368921


   **[Test build #139180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139180/testReport)** for PR 32388 at commit [`493f1d5`](https://github.com/apache/spark/commit/493f1d5d398000f6c476e77ccfa65286f713c78d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] otterc commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
otterc commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r657497797



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       I don't have anything much to add here . Would be okay either ways.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865183826






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852291279


   **[Test build #139176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139176/testReport)** for PR 32388 at commit [`d01b392`](https://github.com/apache/spark/commit/d01b3921f8622ce3d8305c18fedd2f8895a2c7c0).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848088500


   +CC @dongjoon-hyun for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828723507


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42570/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852291325


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139176/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-837069568






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-837181073


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138340/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #32388:
URL: https://github.com/apache/spark/pull/32388


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r628797092



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -128,6 +142,12 @@ private static ShuffleServiceMetricsInfo getShuffleServiceMetricsInfoForCounter(
     return new ShuffleServiceMetricsInfo(name, "Value of counter " + name);
   }
 
+  private static ShuffleServiceMetricsInfo getMetricsInfoGenericValue(

Review comment:
       Although this is a private function, to be consistent with the existing methods, `getMetricsInfoGenericValue` -> `getShuffleServiceMetricsInfoForValue` might be better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828759848


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138051/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-869442489


   Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829496063


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42609/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829579374


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138089/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829488885


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42609/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r623261069



##########
File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
##########
@@ -403,7 +403,9 @@ class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with BeforeAnd
     val metrics = metricSetRef.get(metricsSource).asInstanceOf[MetricSet].getMetrics
 
     assert(metrics.keySet().asScala == Set(
+      "blockTransferRate",

Review comment:
       Updated here and in `ExternalShuffleServiceMetricsSuite`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r631430645



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -98,10 +99,10 @@ public ExternalBlockHandler(
       OneForOneStreamManager streamManager,
       ExternalShuffleBlockResolver blockManager,
       MergedShuffleFileManager mergeManager) {
-    this.metrics = new ShuffleMetrics();
     this.streamManager = streamManager;
     this.blockManager = blockManager;
     this.mergeManager = mergeManager;
+    this.metrics = new ShuffleMetrics();

Review comment:
       This change was reverted in a subsequent commit. Resolving.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r653925046



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       @mridulm I would also be curious about your thoughts on the new metrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839274118


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138402/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852448441


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852341045


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r628799662



##########
File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
##########
@@ -413,7 +416,7 @@ class YarnShuffleServiceSuite extends SparkFunSuite with Matchers with BeforeAnd
       "finalizeShuffleMergeLatencyMillis",
       "shuffle-server.usedDirectMemory",
       "shuffle-server.usedHeapMemory"
-    ))
+    ).sorted)

Review comment:
       Ditto: Although this is an orthogonal contribution, it will be okay.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828685634


   **[Test build #138051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138051/testReport)** for PR 32388 at commit [`83b36f2`](https://github.com/apache/spark/commit/83b36f2c2660e8818c4d22b510180061a08d430d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844462770


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828714199






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865174201






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848280170


   **[Test build #138941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138941/testReport)** for PR 32388 at commit [`867445c`](https://github.com/apache/spark/commit/867445cd390690969ea25466afb36ca275104307).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865136643


   Rebased on `master` to resolve some conflicts with other shuffle metrics related changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844419655


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43237/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-869798457


   Thanks @mridulm ! And many thanks to all reviewers, the feedback was much appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r656693656



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       That would be a new metric, if introduced, in read side.
   Currently, what is tracked treats it as a single request - that request could be a block fetch or a single fetch for a large block or merged block read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r629508528



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -291,9 +294,18 @@ public ShuffleMetrics() {
       allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatencyMillis);
       allMetrics.put("registerExecutorRequestLatencyMillis", registerExecutorRequestLatencyMillis);
       allMetrics.put("finalizeShuffleMergeLatencyMillis", finalizeShuffleMergeLatencyMillis);
+      allMetrics.put("blockTransferRate", blockTransferRate);
       allMetrics.put("blockTransferRateBytes", blockTransferRateBytes);
+      allMetrics.put("blockTransferAvgSize_1min", new RatioGauge() {
+        @Override
+        protected Ratio getRatio() {
+          return Ratio.of(
+              blockTransferRateBytes.getOneMinuteRate(),
+              blockTransferRate.getOneMinuteRate());
+        }
+      });
       allMetrics.put("registeredExecutorsSize",
-                     (Gauge<Integer>) () -> blockManager.getRegisteredExecutorsSize());
+          (Gauge<Integer>) blockManager::getRegisteredExecutorsSize);

Review comment:
       You're right, I will revert. It's not needed, just a minor cleanup -- this is more of the "correct" way to pass a method reference (avoid creating an extra anonymous class just to wrap the method reference in a lambda).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828723507


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42570/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-869798457


   Thanks @mridulm ! And many thanks to all reviewers, the feedback was much appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852320916


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844331013


   @dongjoon-hyun or @jaceklaskowski -- any more comments here or are we good to go? I have other PRs waiting that depend on this which I would like to get posted if possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] otterc commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
otterc commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-833676733


   > Is this dependent on [Ye's PR](https://github.com/apache/spark/pull/32007) @otterc ?
   
   No, it seems completely independent to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848033342


   @dongjoon-hyun @jaceklaskowski gentle ping
   or @mridulm if you want to take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839180403


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42924/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832328008


   **[Test build #138159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138159/testReport)** for PR 32388 at commit [`d519fb9`](https://github.com/apache/spark/commit/d519fb9e44068e0f26f59fc4cc9e19c43c863f7a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852284653


   **[Test build #139173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139173/testReport)** for PR 32388 at commit [`02303e0`](https://github.com/apache/spark/commit/02303e0c170b6716860ccb2f737da30d873b4310).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829569856


   **[Test build #138089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138089/testReport)** for PR 32388 at commit [`08b746a`](https://github.com/apache/spark/commit/08b746ab97345f023fe112f4fa1d196f072958ae).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852474620


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139180/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-836943054


   Thanks for the comments @dongjoon-hyun ! I have pushed up new commits addressing them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852287246


   **[Test build #139176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139176/testReport)** for PR 32388 at commit [`d01b392`](https://github.com/apache/spark/commit/d01b3921f8622ce3d8305c18fedd2f8895a2c7c0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852448441


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848369490


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832394429






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r628799385



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -291,9 +294,18 @@ public ShuffleMetrics() {
       allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatencyMillis);
       allMetrics.put("registerExecutorRequestLatencyMillis", registerExecutorRequestLatencyMillis);
       allMetrics.put("finalizeShuffleMergeLatencyMillis", finalizeShuffleMergeLatencyMillis);
+      allMetrics.put("blockTransferRate", blockTransferRate);
       allMetrics.put("blockTransferRateBytes", blockTransferRateBytes);
+      allMetrics.put("blockTransferAvgSize_1min", new RatioGauge() {
+        @Override
+        protected Ratio getRatio() {
+          return Ratio.of(
+              blockTransferRateBytes.getOneMinuteRate(),
+              blockTransferRate.getOneMinuteRate());
+        }
+      });
       allMetrics.put("registeredExecutorsSize",
-                     (Gauge<Integer>) () -> blockManager.getRegisteredExecutorsSize());
+          (Gauge<Integer>) blockManager::getRegisteredExecutorsSize);

Review comment:
       Is there a reason why we touch another `registeredExecutorsSize` metric at a PR claiming to add new different metrics? Apparently, this looks like orthogonal to me and the PR description doesn't mention this at all either. If you need this for a different reason, shall we make a separate PR?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r643285515



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       On second thought, I don't like `blockFetchRequestRate` since I think `FetchRequest` sounds like it would be at the level of a `OpenBlocks`/`FetchShuffleBlocks` message vs. the individual response level.
   
   Maybe `blockTransferMessageRate`? Or `blockBufferTransferRate`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-829460254


   Thanks for looking @jaceklaskowski ! Just put up a new commit addressing your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-836992973


   **[Test build #138340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138340/testReport)** for PR 32388 at commit [`d7eebfd`](https://github.com/apache/spark/commit/d7eebfd6244123bd765d75307bc026111948b0d3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-837181073


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138340/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865174201


   **[Test build #140083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140083/testReport)** for PR 32388 at commit [`16f53dc`](https://github.com/apache/spark/commit/16f53dc0e88927d0a80f6d87f0311cce90d4ef9a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r650076787



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       @dongjoon-hyun how does the new code look to you?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848308478


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43461/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844419655


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43237/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832390992


   **[Test build #138159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138159/testReport)** for PR 32388 at commit [`d519fb9`](https://github.com/apache/spark/commit/d519fb9e44068e0f26f59fc4cc9e19c43c863f7a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r643299117



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Pushed up a new commit with `blockTransferMessageRate` to demonstrate the idea, let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852458372


   **[Test build #139180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139180/testReport)** for PR 32388 at commit [`493f1d5`](https://github.com/apache/spark/commit/493f1d5d398000f6c476e77ccfa65286f713c78d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852366351


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139173/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r656369998



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Thanks for that context @mridulm! To clarify, are you suggesting we ditch the new proposal and only maintain the old version, which would treat a batch fetch as a single block transfer?
   
   From the comments on #32140, it seems like @otterc was indicating that there's a future plan to add metrics on the push-based shuffle side that also break out the merged chunks in terms of number of underlying blocks. Should we do the same here (which is basically my new proposal)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852341003


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832083649


   @jaceklaskowski do you have any more comments here? Thanks a lot for your help so far!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-832394429






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852363512


   **[Test build #139173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139173/testReport)** for PR 32388 at commit [`02303e0`](https://github.com/apache/spark/commit/02303e0c170b6716860ccb2f737da30d873b4310).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848355795


   **[Test build #138941 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138941/testReport)** for PR 32388 at commit [`867445c`](https://github.com/apache/spark/commit/867445cd390690969ea25466afb36ca275104307).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848369490


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138941/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839178122






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852448420


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r642165193



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,32 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentileThousands : new int[] { 10, 50, 250, 500, 750, 950, 980, 990, 999 }) {
+        String percentileStr;
+        switch (percentileThousands) {
+          case 10:
+            percentileStr = "1stPercentile";
+            break;
+          case 999:
+            percentileStr = "99.9thPercentile";

Review comment:
       Hi, @xkrogen . Thank you for adding this and I saw your [comment](https://github.com/apache/spark/pull/32388#discussion_r629507805).
   
   Technically, `999thPercentile` is consistent with the Apache Spark convention instead of `99.9thPercentitle`. If you look at the other existing metrics, it has the following.
   <img width="146" alt="Screen Shot 2021-05-30 at 6 52 14 PM" src="https://user-images.githubusercontent.com/9700541/120129072-325b1b80-c178-11eb-9ff2-dd242951e24c.png">
   
   
   And, just a question. May I ask where you borrow this naming style, `99.9thPercentitle`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r655778959



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       In spark read side, we treat a single read of `ShuffleBlockBatchId` as a single block read (from metric point of view).
   A [recent discussion](https://github.com/apache/spark/pull/32140#discussion_r652733447) of this in push based shuffle here for context (see `ShuffleBlockBatchId` in `ShuffleBlockFetcherIterator`).
   
   Given that, I am fine with treating a batch read as a single read - given it would be contiguous (effectively, similar to reading a 'large' block).
   
   Thoughts @dongjoon-hyun  ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828685634


   **[Test build #138051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138051/testReport)** for PR 32388 at commit [`83b36f2`](https://github.com/apache/spark/commit/83b36f2c2660e8818c4d22b510180061a08d430d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r628799592



##########
File path: resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala
##########
@@ -37,28 +38,57 @@ class YarnShuffleServiceMetricsSuite extends SparkFunSuite with Matchers {
   val metrics = new ExternalBlockHandler(streamManager, blockResolver).getAllMetrics
 
   test("metrics named as expected") {
-    val allMetrics = Set(
+    val allMetrics = Seq(

Review comment:
       This contribution is also orthogonal to the PR's main contribution but this looks okay in this case since it's a trivial.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-869449381


   Thanks for working on this @xkrogen !
   Thanks for the reviews @dongjoon-hyun, @otterc, @jaceklaskowski


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844377298


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43237/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-837079972


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42862/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852366351


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139173/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r623210177



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -291,9 +294,18 @@ public ShuffleMetrics() {
       allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatencyMillis);
       allMetrics.put("registerExecutorRequestLatencyMillis", registerExecutorRequestLatencyMillis);
       allMetrics.put("finalizeShuffleMergeLatencyMillis", finalizeShuffleMergeLatencyMillis);
+      allMetrics.put("blockTransferRate", blockTransferRate);
       allMetrics.put("blockTransferRateBytes", blockTransferRateBytes);
+      allMetrics.put("blockTransferAvgSize_1min", new RatioGauge() {
+        @Override
+        protected Ratio getRatio() {
+          return Ratio.of(
+              blockTransferRateBytes.getOneMinuteRate(),
+              blockTransferRate.getOneMinuteRate());
+        }
+      });
       allMetrics.put("registeredExecutorsSize",
-                     (Gauge<Integer>) () -> blockManager.getRegisteredExecutorsSize());
+          (Gauge<Integer>) blockManager::getRegisteredExecutorsSize);

Review comment:
       Yes, this is a Java 8 "method reference". Read more about them [here](https://docs.oracle.com/javase/tutorial/java/javaOO/methodreferences.html) (note that at the top of the page it says it is written against JDK 8).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852284653


   **[Test build #139173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139173/testReport)** for PR 32388 at commit [`02303e0`](https://github.com/apache/spark/commit/02303e0c170b6716860ccb2f737da30d873b4310).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839261012


   **[Test build #138402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138402/testReport)** for PR 32388 at commit [`42e48e5`](https://github.com/apache/spark/commit/42e48e58a5c909858204b0614d68a991bb46e53d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r643241601



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,32 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentileThousands : new int[] { 10, 50, 250, 500, 750, 950, 980, 990, 999 }) {
+        String percentileStr;
+        switch (percentileThousands) {
+          case 10:
+            percentileStr = "1stPercentile";
+            break;
+          case 999:
+            percentileStr = "99.9thPercentile";

Review comment:
       Thanks for sharing, I did not realize there was an existing convention here. I've updated it to be `999thPercentile`. I didn't borrow it from anywhere in particular, just how I would say it aloud: "ninety-nine point ninth percentile".




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r629507805



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,17 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(getMetricsInfoGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(getMetricsInfoGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(getMetricsInfoGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(getMetricsInfoGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentile : new int[] { 1, 5, 25, 50, 75, 95, 99 }) {

Review comment:
       I added 98th and 99.9th percentiles. Thanks for the input!
   
   One thing that may seem unusual is the low percentiles (1, 5, 25), which have found useful internally for a getting a better sense of the full range of timing information. Let me know if you think they are too non-standard to include here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r657628414



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Would definitely want to hear @dongjoon-hyun's thoughts here, given he has context !
   Also, +CC @Ngone51 who is helping review the client side for push based shuffle and has looked at similar codepaths recently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865286342


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140083/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r655778959



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       In spark read side, we treat a single read of `ShuffleBlockBatchId` as a single block read (from metric point of view).
   A [recent discussion](https://github.com/apache/spark/pull/32140#discussion_r652733447) of this in push based shuffle here for context (see `ShuffleBlockBatchId` in `ShuffleBlockFetcherIterator`).
   
   Given that, I am fine with treating a batch read as a single read - given it would be contiguous (effectively, similar to reading a 'large' block).
   
   Thoughts @dongjoon-hyun, @xkrogen ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852397637


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43699/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r655778959



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       In spark read side, we treat a single read of `ShuffleBlockBatchId` as a single block read (from metric point of view).
   A [recent discussion](https://github.com/apache/spark/pull/32140#discussion_r652733447) of this in push based shuffle here for context (see `ShuffleBlockBatchId` in `ShuffleBlockFetcherIterator`).
   
   Given that, I am fine with treating a batch read as a single read - given it would be contiguous (effectively, similar to reading a 'large' block).
   
   Thoughts @dongjoon-hyun  ?

##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       In spark read side, we treat a single read of `ShuffleBlockBatchId` as a single block read (from metric point of view).
   A [recent discussion](https://github.com/apache/spark/pull/32140#discussion_r652733447) of this in push based shuffle here for context (see `ShuffleBlockBatchId` in `ShuffleBlockFetcherIterator`).
   
   Given that, I am fine with treating a batch read as a single read - given it would be contiguous (effectively, similar to reading a 'large' block).
   
   Thoughts @dongjoon-hyun, @xkrogen ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865276127


   **[Test build #140083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140083/testReport)** for PR 32388 at commit [`16f53dc`](https://github.com/apache/spark/commit/16f53dc0e88927d0a80f6d87f0311cce90d4ef9a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r639196525



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java
##########
@@ -78,7 +82,32 @@ public static void collectMetric(
           new ShuffleServiceMetricsInfo(name + "_rate1", "1 minute rate of timer " + name),
           t.getOneMinuteRate())
         .addGauge(new ShuffleServiceMetricsInfo(name + "_rateMean", "Mean rate of timer " + name),
-          t.getMeanRate());
+          t.getMeanRate())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "max"), snapshot.getMax())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "min"), snapshot.getMin())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "mean"), snapshot.getMean())
+        .addGauge(
+          getShuffleServiceMetricsInfoForGenericValue(timingName, "stdDev"), snapshot.getStdDev());
+      for (int percentileThousands : new int[] { 10, 50, 250, 500, 750, 950, 980, 990, 999 }) {
+        String percentileStr;
+        switch (percentileThousands) {
+          case 10:
+            percentileStr = "1stPercentile";
+            break;
+          case 999:
+            percentileStr = "99.9thPercentile";
+            break;
+          default:
+            percentileStr = String.format("%d%sPercentile", percentileThousands / 10, "th");

Review comment:
       Good catch, this was leftover from previous logic. thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-844455802


   **[Test build #138715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138715/testReport)** for PR 32388 at commit [`e63d7ef`](https://github.com/apache/spark/commit/e63d7efa9d280bccda980004ad2e154845ea4acb).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-833638975


   Is this dependent on [Ye's PR](https://github.com/apache/spark/pull/32007) @otterc ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865174201


   **[Test build #140083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140083/testReport)** for PR 32388 at commit [`16f53dc`](https://github.com/apache/spark/commit/16f53dc0e88927d0a80f6d87f0311cce90d4ef9a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828686364


   cc @tgravescs @mridulm @dongjoon-hyun @gatorsmile 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r643248522



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       This is a great point, thanks Dongjoon!
   
   Currently a single batch fetch will be considered as one block fetch by this metric, regardless of how many blocks are fetched in the block. For our purposes, this is what we're interested in, since a big part of what we want to understand with this is the number of random reads we're submitting.
   
   However I see value in breaking it out as the actual number of blocks as well. Perhaps we can have two metrics, `blockTransferRate` and `blockFetchRequestRate`. In the case of non-batch fetches they will be the same, and with batch fetches the `blockTransferRate` will be higher than the `blockFetchRequestRate`. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865183826


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/44611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-839180403


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42924/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-852341045


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-828758963


   **[Test build #138051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138051/testReport)** for PR 32388 at commit [`83b36f2`](https://github.com/apache/spark/commit/83b36f2c2660e8818c4d22b510180061a08d430d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-865177508


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-837079972


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42862/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-848280170


   **[Test build #138941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138941/testReport)** for PR 32388 at commit [`867445c`](https://github.com/apache/spark/commit/867445cd390690969ea25466afb36ca275104307).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #32388:
URL: https://github.com/apache/spark/pull/32388#discussion_r657289183



##########
File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##########
@@ -264,6 +265,8 @@ private void checkAuth(TransportClient client, String appId) {
     private final Timer registerExecutorRequestLatencyMillis = new Timer();
     // Time latency for processing finalize shuffle merge request latency in ms
     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
+    // Block transfer rate in blocks per second

Review comment:
       Okay. I am pretty neutral here, happy to go either way. @dongjoon-hyun or @otterc , any thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #32388: [SPARK-35258][SHUFFLE][YARN] Add new metrics to ExternalShuffleService for better monitoring

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #32388:
URL: https://github.com/apache/spark/pull/32388#issuecomment-851106115


   Sorry for being late, @mridulm and @xkrogen .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org