You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by lovexi <gi...@git.apache.org> on 2016/07/06 22:28:21 UTC

[GitHub] spark pull request #14080: [SPARK-16405] [ShuffleService] [Metrics] Add metr...

GitHub user lovexi opened a pull request:

    https://github.com/apache/spark/pull/14080

    [SPARK-16405] [ShuffleService] [Metrics] Add metrics and source for external shuffle service

    ## What changes were proposed in this pull request?
    
    ExternalShuffleService is essential for spark. In order to better monitor shuffle service, we added various metrics in shuffle service and ExternalShuffleServiceSource for metric system.
    Metrics added in shuffle service:
    * registeredExecutorsSize
    * totalShuffleRequests
    * timeDelayForOpenBlockRequest
    * timeDelayForRegisterExecutorRequest
    * transferBlockRate
    
    ## How was this patch tested?
    
    Unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lovexi/spark yangyang-metrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14080
    
----
commit 3f8c065ca0587956b0bb379d7aa4a06041c68b07
Author: Yangyang Liu <ya...@fb.com>
Date:   2016-07-06T22:05:36Z

    Add metrics and source for external shuffle service

commit e3b4a9eee01c598e5b2d3b6cc8f47bdc9659c7c0
Author: Yangyang Liu <ya...@fb.com>
Date:   2016-07-06T22:20:19Z

    Minor nit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Merging in master. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14080


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70109683
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +179,26 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    private final Timer timeDelayForOpenBlockRequest = new Timer();
    +    private final Timer timeDelayForRegisterExecutorRequest = new Timer();
    +    private final Meter transferBlockRate = new Meter();
    +
    +    private ShuffleMetrics() {
    +      allMetrics = new HashMap<>();
    --- End diff --
    
    Thanks for commenting. Yeah, it can work smoothly with Java 7. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3185/consoleFull)** for PR 14080 at commit [`4884084`](https://github.com/apache/spark/commit/488408479858b026cf67d9a04e7b0fe1aad8934d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3180 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3180/consoleFull)** for PR 14080 at commit [`95a864c`](https://github.com/apache/spark/commit/95a864cfd1ca6df8d6c83e9c2968ceacc7b592d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70171929
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -79,32 +89,59 @@ protected void handleMessage(
           TransportClient client,
           RpcResponseCallback callback) {
         if (msgObj instanceof OpenBlocks) {
    -      OpenBlocks msg = (OpenBlocks) msgObj;
    -      checkAuth(client, msg.appId);
    -
    -      List<ManagedBuffer> blocks = Lists.newArrayList();
    -      for (String blockId : msg.blockIds) {
    -        blocks.add(blockManager.getBlockData(msg.appId, msg.execId, blockId));
    +      // Reset transferred block size metrics as zero
    +      final Timer.Context responseDelayContext = metrics.openBlockRequestLatencyMillis.time();
    +      try {
    +        OpenBlocks msg = (OpenBlocks) msgObj;
    +        checkAuth(client, msg.appId);
    +
    +        List<ManagedBuffer> blocks = Lists.newArrayList();
    +        long totalBlockSize = 0;
    +        for (String blockId : msg.blockIds) {
    +          final ManagedBuffer block = blockManager.getBlockData(msg.appId, msg.execId, blockId);
    +          totalBlockSize += block.size();
    +          blocks.add(block);
    +        }
    +        long streamId = streamManager.registerStream(client.getClientId(), blocks.iterator());
    +        logger.trace("Registered streamId {} with {} buffers for client {} from host {}",
    +                     streamId,
    +                     msg.blockIds.length,
    +                     client.getClientId(),
    +                     NettyUtils.getRemoteAddress(client.getChannel()));
    +        callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +        metrics.blockTransferRateMBytes.mark(totalBlockSize / 1024 / 1024);
    +      } finally {
    +        responseDelayContext.stop();
           }
    -      long streamId = streamManager.registerStream(client.getClientId(), blocks.iterator());
    -      logger.trace("Registered streamId {} with {} buffers for client {} from host {}",
    -          streamId,
    -          msg.blockIds.length,
    -          client.getClientId(),
    -          NettyUtils.getRemoteAddress(client.getChannel()));
    -      callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
     
         } else if (msgObj instanceof RegisterExecutor) {
    -      RegisterExecutor msg = (RegisterExecutor) msgObj;
    -      checkAuth(client, msg.appId);
    -      blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
    -      callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      final Timer.Context responseDelayContext = metrics.registerExecutorRequestLatencyMillis.time();
    +      try {
    +        RegisterExecutor msg = (RegisterExecutor) msgObj;
    +        checkAuth(client, msg.appId);
    +        blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
    +        callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      } finally {
    +        responseDelayContext.stop();
    +      }
     
         } else {
           throw new UnsupportedOperationException("Unexpected message: " + msgObj);
         }
       }
     
    +  public MetricSet getAllMetrics() {
    +    return metrics;
    +  }
    +
    +  public long getRegisteredExecutorsSize() {
    +    return blockManager.getRegisteredExecutorsSize();
    +  }
    +
    +  public long getTotalShuffleRequests() {
    --- End diff --
    
    Removed extra metrics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70171931
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -52,6 +59,8 @@
       @VisibleForTesting
       final ExternalShuffleBlockResolver blockManager;
       private final OneForOneStreamManager streamManager;
    +  // Shuffle service metrics setup
    --- End diff --
    
    Addressed nit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Added some test cases to verify expected metrics values @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r69981791
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +179,26 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    private final Timer timeDelayForOpenBlockRequest = new Timer();
    +    private final Timer timeDelayForRegisterExecutorRequest = new Timer();
    +    private final Meter transferBlockRate = new Meter();
    +
    +    private ShuffleMetrics() {
    +      allMetrics = new HashMap<>();
    --- End diff --
    
    Will it work with Java 7? I think Spark 2.0 will keep support for the version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] [ShuffleService] [Metrics] Add metrics and...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70171996
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ExternalShuffleServiceSource.scala ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import javax.annotation.concurrent.ThreadSafe
    +
    +import com.codahale.metrics.{Gauge, MetricRegistry}
    +
    +import org.apache.spark.metrics.source.Source
    +import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler
    +
    +/**
    + * Provides metrics source for external shuffle service
    + */
    +@ThreadSafe
    +private class ExternalShuffleServiceSource
    +(blockHandler: ExternalShuffleBlockHandler) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +  override val sourceName = "shuffleService"
    +
    +  metricRegistry.registerAll(blockHandler.getAllMetrics)
    +
    +  metricRegistry.register(MetricRegistry.name("registeredExecutorsSize"),
    --- End diff --
    
    That's a good point. Addressed! Moved all metrics inside and register them by `registerAll`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    @rxin Thank you for mentioning that. I even didn't notice that. Haha.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r69988464
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -93,18 +113,34 @@ protected void handleMessage(
               client.getClientId(),
               NettyUtils.getRemoteAddress(client.getChannel()));
           callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +      transferBlockRate.mark(totalBlockSize / 1024 / 1024);
    +      responseDelayContext.stop();
     
         } else if (msgObj instanceof RegisterExecutor) {
    +      final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time();
           RegisterExecutor msg = (RegisterExecutor) msgObj;
           checkAuth(client, msg.appId);
           blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
           callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      responseDelayContext.stop();
    --- End diff --
    
    Consider putting all `stop` calls in a finally block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r69988593
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ExternalShuffleServiceSource.scala ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import javax.annotation.concurrent.ThreadSafe
    +
    +import com.codahale.metrics.{Gauge, MetricRegistry}
    +
    +import org.apache.spark.metrics.source.Source
    +import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler
    +
    +/**
    + * Provides metrics source for external shuffle service
    + */
    +@ThreadSafe
    +private class ExternalShuffleServiceSource
    +(blockHandler: ExternalShuffleBlockHandler) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +  override val sourceName = "shuffleService"
    +
    +  metricRegistry.registerAll(blockHandler.getAllMetrics)
    +
    +  metricRegistry.register(MetricRegistry.name("registeredExecutorsSize"),
    --- End diff --
    
    Rather than creating these metrics externally here, consider putting them inside the `metricSet`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    test this, please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150386
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +180,29 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    // Time latency for open block request in ms
    +    private final Timer openBlockRequestLatencyMillis = new Timer();
    +    // Time latency for executor registration latency in ms
    +    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    +    // Block transfer rate in mbps
    +    private final Meter blockTransferRateMBytes = new Meter();
    --- End diff --
    
    Why not bytes? Won't this cause rounding errors?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    @rxin Sure. Get a cleaner title instead. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70171933
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -79,32 +89,59 @@ protected void handleMessage(
           TransportClient client,
           RpcResponseCallback callback) {
         if (msgObj instanceof OpenBlocks) {
    -      OpenBlocks msg = (OpenBlocks) msgObj;
    -      checkAuth(client, msg.appId);
    -
    -      List<ManagedBuffer> blocks = Lists.newArrayList();
    -      for (String blockId : msg.blockIds) {
    -        blocks.add(blockManager.getBlockData(msg.appId, msg.execId, blockId));
    +      // Reset transferred block size metrics as zero
    --- End diff --
    
    Addressed nit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3180/consoleFull)** for PR 14080 at commit [`95a864c`](https://github.com/apache/spark/commit/95a864cfd1ca6df8d6c83e9c2968ceacc7b592d4).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r69989502
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +179,26 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    private final Timer timeDelayForOpenBlockRequest = new Timer();
    +    private final Timer timeDelayForRegisterExecutorRequest = new Timer();
    +    private final Meter transferBlockRate = new Meter();
    --- End diff --
    
    Can you add comments describing the metrics and their units? e.g. bytes, milliseconds
    
    Also consider renaming them for clarity, I think `openBlockRequestLatencyMillis`, `registerExecutorRequestLatencyMillis`, `blockTransferRateBytes` would be more clear to the reader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    test it, please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r69988337
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -64,6 +75,10 @@ public ExternalShuffleBlockHandler(TransportConf conf, File registeredExecutorFi
       public ExternalShuffleBlockHandler(
           OneForOneStreamManager streamManager,
           ExternalShuffleBlockResolver blockManager) {
    +    this.metrics = new ShuffleMetrics();
    +    this.timeDelayForOpenBlockRequest = metrics.timeDelayForOpenBlockRequest;
    +    this.timeDelayForRegisterExecutorRequest = metrics.timeDelayForRegisterExecutorRequest;
    --- End diff --
    
    It's a little confusing how this metric is duplicated as a class member. Would it work to just reference it through `metrics`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] [ShuffleService] [Metrics] Add metrics and...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Can you update to the title to just say "[SPARK-16405] Add metrics and source for external shuffle service"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Thanks, this LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3181/consoleFull)** for PR 14080 at commit [`9ca99bb`](https://github.com/apache/spark/commit/9ca99bbeee8cac6d21b7888571f2c9ab7b51c453).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    @rxin Thank you for letting me know this. That saves me a lot of time on testing..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70111868
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -64,6 +75,10 @@ public ExternalShuffleBlockHandler(TransportConf conf, File registeredExecutorFi
       public ExternalShuffleBlockHandler(
           OneForOneStreamManager streamManager,
           ExternalShuffleBlockResolver blockManager) {
    +    this.metrics = new ShuffleMetrics();
    +    this.timeDelayForOpenBlockRequest = metrics.timeDelayForOpenBlockRequest;
    +    this.timeDelayForRegisterExecutorRequest = metrics.timeDelayForRegisterExecutorRequest;
    --- End diff --
    
    Yeah, it seems like kind of extra. Already remove local ones and reference them directly from metrics class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150510
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -93,18 +113,34 @@ protected void handleMessage(
               client.getClientId(),
               NettyUtils.getRemoteAddress(client.getChannel()));
           callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +      transferBlockRate.mark(totalBlockSize / 1024 / 1024);
    +      responseDelayContext.stop();
     
         } else if (msgObj instanceof RegisterExecutor) {
    +      final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time();
           RegisterExecutor msg = (RegisterExecutor) msgObj;
           checkAuth(client, msg.appId);
           blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
           callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      responseDelayContext.stop();
     
         } else {
           throw new UnsupportedOperationException("Unexpected message: " + msgObj);
         }
       }
     
    +  public MetricSet getAllMetrics() {
    +    return metrics;
    +  }
    +
    +  public long getRegisteredExecutorsSize() {
    +    return blockManager.getRegisteredExecutorsSize();
    +  }
    +
    +  public long getTotalShuffleRequests() {
    +    return timeDelayForOpenBlockRequest.getCount() + timeDelayForOpenBlockRequest.getCount();
    --- End diff --
    
    Since there are only two metrics, I think it is better not to since it can be confusing. If there were many different kinds of requests maybe then a total would be ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70171924
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +180,29 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    // Time latency for open block request in ms
    +    private final Timer openBlockRequestLatencyMillis = new Timer();
    +    // Time latency for executor registration latency in ms
    +    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    +    // Block transfer rate in mbps
    +    private final Meter blockTransferRateMBytes = new Meter();
    --- End diff --
    
    Addressed! Changed it to blockTransferRateBytes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3181 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3181/consoleFull)** for PR 14080 at commit [`9ca99bb`](https://github.com/apache/spark/commit/9ca99bbeee8cac6d21b7888571f2c9ab7b51c453).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Restart test, please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70115043
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +179,26 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    private final Timer timeDelayForOpenBlockRequest = new Timer();
    +    private final Timer timeDelayForRegisterExecutorRequest = new Timer();
    +    private final Meter transferBlockRate = new Meter();
    --- End diff --
    
    Thank you for suggestions. That makes variable names much better. Already addressed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70171938
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +180,29 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    // Time latency for open block request in ms
    +    private final Timer openBlockRequestLatencyMillis = new Timer();
    +    // Time latency for executor registration latency in ms
    +    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    +    // Block transfer rate in mbps
    +    private final Meter blockTransferRateMBytes = new Meter();
    +
    +    private ShuffleMetrics() {
    +      allMetrics = new HashMap<>();
    +      allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatencyMillis);
    +      allMetrics.put("registerExecutorRequestLatencyMillis", registerExecutorRequestLatencyMillis);
    +      allMetrics.put("blockTransferRateBytes", blockTransferRateMBytes);
    --- End diff --
    
    Since I changed it to blockTransferRateBytes, keep the same name and refactor metric name to blockTransferRateBytes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    @lovexi you should add your commit email to your github profile, so the commit will show proper attribution.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3182/consoleFull)** for PR 14080 at commit [`258ed7a`](https://github.com/apache/spark/commit/258ed7a56fdc0bdce4f93de0e58e2c37b08f7e86).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70114491
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ExternalShuffleServiceSource.scala ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import javax.annotation.concurrent.ThreadSafe
    +
    +import com.codahale.metrics.{Gauge, MetricRegistry}
    +
    +import org.apache.spark.metrics.source.Source
    +import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler
    +
    +/**
    + * Provides metrics source for external shuffle service
    + */
    +@ThreadSafe
    +private class ExternalShuffleServiceSource
    +(blockHandler: ExternalShuffleBlockHandler) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +  override val sourceName = "shuffleService"
    +
    +  metricRegistry.registerAll(blockHandler.getAllMetrics)
    +
    +  metricRegistry.register(MetricRegistry.name("registeredExecutorsSize"),
    --- End diff --
    
    I've already created a `shuffleMetrics` set inside `ExternalShuffleBlockHandler` and register here. Is that kind of extra to set another `MetricSet`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Thank you for reminding me this. Already updated PR description and add more details. 
    cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150594
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -143,4 +180,29 @@ private void checkAuth(TransportClient client, String appId) {
         }
       }
     
    +  /**
    +   * A simple class to wrap all shuffle service wrapper metrics
    +   */
    +  private class ShuffleMetrics implements MetricSet {
    +    private final Map<String, Metric> allMetrics;
    +    // Time latency for open block request in ms
    +    private final Timer openBlockRequestLatencyMillis = new Timer();
    +    // Time latency for executor registration latency in ms
    +    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    +    // Block transfer rate in mbps
    +    private final Meter blockTransferRateMBytes = new Meter();
    +
    +    private ShuffleMetrics() {
    +      allMetrics = new HashMap<>();
    +      allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatencyMillis);
    +      allMetrics.put("registerExecutorRequestLatencyMillis", registerExecutorRequestLatencyMillis);
    +      allMetrics.put("blockTransferRateBytes", blockTransferRateMBytes);
    --- End diff --
    
    Name seems inconsistent here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150418
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -52,6 +59,8 @@
       @VisibleForTesting
       final ExternalShuffleBlockResolver blockManager;
       private final OneForOneStreamManager streamManager;
    +  // Shuffle service metrics setup
    --- End diff --
    
    nit: don't need this comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    There is a test failure
    
    ```
    [error] Test org.apache.spark.network.sasl.SaslIntegrationSuite.testAppIsolation failed: java.lang.RuntimeException: java.lang.NullPointerException
    [error] 	at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.handleMessage(ExternalShuffleBlockHandler.java:100)
    [error] 	at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:83)
    [error] 	at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:80)
    [error] 	at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:158)
    [error] 	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106)
    [error] 	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
    [error] 	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    [error] 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error] 	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error] 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error] 	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error] 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error] 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    [error] 	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    [error] 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    [error] 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    [error] 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    [error] 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    [error] 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    [error] 	at java.lang.Thread.run(Thread.java:745)
    [error] , took 0.073 sec
    [error]     at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:190)
    [error]     at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:121)
    [error]     at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    [error]     at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error]     at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error]     at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error]     at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    [error]     at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    [error]     at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    [error]     at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    [error]     at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    [error]     at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    [error]     at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    [error]     at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    [error]     at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    [error]     at java.lang.Thread.run(Thread.java:745)
    [info] Test org.apache.spark.network.sasl.SaslIntegrationSuite.testNoSaslClient started
    [info] Test org.apache.spark.network.sasl.SaslIntegrationSuite.testNoSaslServer started
    [info] Test org.apache.spark.network.sasl.SaslIntegrationSuite.testBadClient started
    [info] Test run finished: 1 failed, 0 ignored, 5 total, 1.741s
    ```
    
    @lovexi you can run the tests locally too with build/sbt test network-shuffle


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Thanks for adding these metrics. Could you also add some unit tests to sanity check these metrics are recorded as expected, e.g. as in https://github.com/apache/spark/pull/13934/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r69989978
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -93,18 +113,34 @@ protected void handleMessage(
               client.getClientId(),
               NettyUtils.getRemoteAddress(client.getChannel()));
           callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +      transferBlockRate.mark(totalBlockSize / 1024 / 1024);
    +      responseDelayContext.stop();
     
         } else if (msgObj instanceof RegisterExecutor) {
    +      final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time();
           RegisterExecutor msg = (RegisterExecutor) msgObj;
           checkAuth(client, msg.appId);
           blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
           callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      responseDelayContext.stop();
     
         } else {
           throw new UnsupportedOperationException("Unexpected message: " + msgObj);
         }
       }
     
    +  public MetricSet getAllMetrics() {
    +    return metrics;
    +  }
    +
    +  public long getRegisteredExecutorsSize() {
    +    return blockManager.getRegisteredExecutorsSize();
    +  }
    +
    +  public long getTotalShuffleRequests() {
    +    return timeDelayForOpenBlockRequest.getCount() + timeDelayForOpenBlockRequest.getCount();
    --- End diff --
    
    Btw I don't think you need this metric, the client can easily derive it from the other metrics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    Can you update the pull request description? It is now outdated.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150424
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -79,32 +89,59 @@ protected void handleMessage(
           TransportClient client,
           RpcResponseCallback callback) {
         if (msgObj instanceof OpenBlocks) {
    -      OpenBlocks msg = (OpenBlocks) msgObj;
    -      checkAuth(client, msg.appId);
    -
    -      List<ManagedBuffer> blocks = Lists.newArrayList();
    -      for (String blockId : msg.blockIds) {
    -        blocks.add(blockManager.getBlockData(msg.appId, msg.execId, blockId));
    +      // Reset transferred block size metrics as zero
    --- End diff --
    
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3182 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3182/consoleFull)** for PR 14080 at commit [`258ed7a`](https://github.com/apache/spark/commit/258ed7a56fdc0bdce4f93de0e58e2c37b08f7e86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150439
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -79,32 +89,59 @@ protected void handleMessage(
           TransportClient client,
           RpcResponseCallback callback) {
         if (msgObj instanceof OpenBlocks) {
    -      OpenBlocks msg = (OpenBlocks) msgObj;
    -      checkAuth(client, msg.appId);
    -
    -      List<ManagedBuffer> blocks = Lists.newArrayList();
    -      for (String blockId : msg.blockIds) {
    -        blocks.add(blockManager.getBlockData(msg.appId, msg.execId, blockId));
    +      // Reset transferred block size metrics as zero
    +      final Timer.Context responseDelayContext = metrics.openBlockRequestLatencyMillis.time();
    +      try {
    +        OpenBlocks msg = (OpenBlocks) msgObj;
    +        checkAuth(client, msg.appId);
    +
    +        List<ManagedBuffer> blocks = Lists.newArrayList();
    +        long totalBlockSize = 0;
    +        for (String blockId : msg.blockIds) {
    +          final ManagedBuffer block = blockManager.getBlockData(msg.appId, msg.execId, blockId);
    +          totalBlockSize += block.size();
    +          blocks.add(block);
    +        }
    +        long streamId = streamManager.registerStream(client.getClientId(), blocks.iterator());
    +        logger.trace("Registered streamId {} with {} buffers for client {} from host {}",
    +                     streamId,
    +                     msg.blockIds.length,
    +                     client.getClientId(),
    +                     NettyUtils.getRemoteAddress(client.getChannel()));
    +        callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +        metrics.blockTransferRateMBytes.mark(totalBlockSize / 1024 / 1024);
    +      } finally {
    +        responseDelayContext.stop();
           }
    -      long streamId = streamManager.registerStream(client.getClientId(), blocks.iterator());
    -      logger.trace("Registered streamId {} with {} buffers for client {} from host {}",
    -          streamId,
    -          msg.blockIds.length,
    -          client.getClientId(),
    -          NettyUtils.getRemoteAddress(client.getChannel()));
    -      callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
     
         } else if (msgObj instanceof RegisterExecutor) {
    -      RegisterExecutor msg = (RegisterExecutor) msgObj;
    -      checkAuth(client, msg.appId);
    -      blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
    -      callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      final Timer.Context responseDelayContext = metrics.registerExecutorRequestLatencyMillis.time();
    +      try {
    +        RegisterExecutor msg = (RegisterExecutor) msgObj;
    +        checkAuth(client, msg.appId);
    +        blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
    +        callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      } finally {
    +        responseDelayContext.stop();
    +      }
     
         } else {
           throw new UnsupportedOperationException("Unexpected message: " + msgObj);
         }
       }
     
    +  public MetricSet getAllMetrics() {
    +    return metrics;
    +  }
    +
    +  public long getRegisteredExecutorsSize() {
    +    return blockManager.getRegisteredExecutorsSize();
    +  }
    +
    +  public long getTotalShuffleRequests() {
    --- End diff --
    
    Can you remove this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14080: [SPARK-16405] Add metrics and source for external shuffl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14080
  
    **[Test build #3185 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3185/consoleFull)** for PR 14080 at commit [`4884084`](https://github.com/apache/spark/commit/488408479858b026cf67d9a04e7b0fe1aad8934d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70110638
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -93,18 +113,34 @@ protected void handleMessage(
               client.getClientId(),
               NettyUtils.getRemoteAddress(client.getChannel()));
           callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +      transferBlockRate.mark(totalBlockSize / 1024 / 1024);
    +      responseDelayContext.stop();
     
         } else if (msgObj instanceof RegisterExecutor) {
    +      final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time();
           RegisterExecutor msg = (RegisterExecutor) msgObj;
           checkAuth(client, msg.appId);
           blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
           callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      responseDelayContext.stop();
    --- End diff --
    
    Good point. Added two finally blocks for timeDelayContext stop method calls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by lovexi <gi...@git.apache.org>.
Github user lovexi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70112425
  
    --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java ---
    @@ -93,18 +113,34 @@ protected void handleMessage(
               client.getClientId(),
               NettyUtils.getRemoteAddress(client.getChannel()));
           callback.onSuccess(new StreamHandle(streamId, msg.blockIds.length).toByteBuffer());
    +      transferBlockRate.mark(totalBlockSize / 1024 / 1024);
    +      responseDelayContext.stop();
     
         } else if (msgObj instanceof RegisterExecutor) {
    +      final Timer.Context responseDelayContext = timeDelayForRegisterExecutorRequest.time();
           RegisterExecutor msg = (RegisterExecutor) msgObj;
           checkAuth(client, msg.appId);
           blockManager.registerExecutor(msg.appId, msg.execId, msg.executorInfo);
           callback.onSuccess(ByteBuffer.wrap(new byte[0]));
    +      responseDelayContext.stop();
     
         } else {
           throw new UnsupportedOperationException("Unexpected message: " + msgObj);
         }
       }
     
    +  public MetricSet getAllMetrics() {
    +    return metrics;
    +  }
    +
    +  public long getRegisteredExecutorsSize() {
    +    return blockManager.getRegisteredExecutorsSize();
    +  }
    +
    +  public long getTotalShuffleRequests() {
    +    return timeDelayForOpenBlockRequest.getCount() + timeDelayForOpenBlockRequest.getCount();
    --- End diff --
    
    Oh, if this metric can be exported to UI or other data collector, this metric can provide much more clear and direct vision about total shuffle request number in real-time fashion. What do you think about it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14080: [SPARK-16405] Add metrics and source for external...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14080#discussion_r70150346
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ExternalShuffleServiceSource.scala ---
    @@ -0,0 +1,47 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy
    +
    +import javax.annotation.concurrent.ThreadSafe
    +
    +import com.codahale.metrics.{Gauge, MetricRegistry}
    +
    +import org.apache.spark.metrics.source.Source
    +import org.apache.spark.network.shuffle.ExternalShuffleBlockHandler
    +
    +/**
    + * Provides metrics source for external shuffle service
    + */
    +@ThreadSafe
    +private class ExternalShuffleServiceSource
    +(blockHandler: ExternalShuffleBlockHandler) extends Source {
    +  override val metricRegistry = new MetricRegistry()
    +  override val sourceName = "shuffleService"
    +
    +  metricRegistry.registerAll(blockHandler.getAllMetrics)
    +
    +  metricRegistry.register(MetricRegistry.name("registeredExecutorsSize"),
    --- End diff --
    
    I mean, why not put them inside blockHandler so that they are registered by the same `registerAll` call? Then you don't need to expose these extra methods to get executors size and total shuffle requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org