You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/05/20 06:17:00 UTC

[jira] [Assigned] (SPARK-27773) Add shuffle service metric for number of exceptions caught in TransportChannelHandler

     [ https://issues.apache.org/jira/browse/SPARK-27773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-27773:
------------------------------------

    Assignee: Apache Spark

> Add shuffle service metric for number of exceptions caught in TransportChannelHandler
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-27773
>                 URL: https://issues.apache.org/jira/browse/SPARK-27773
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.4.3
>            Reporter: Steven Rand
>            Assignee: Apache Spark
>            Priority: Minor
>
> The health of the external shuffle service is currently difficult to monitor. At least for the YARN shuffle service, the only current indication of health is whether or not the shuffle service threads are running in the NodeManager. However, we've seen that clients can sometimes experience elevated failure rates on requests to the shuffle service even when those threads are running. It would be helpful to have some indication of how often requests to the shuffle service are failing, as this could be monitored, alerted on, etc.
> One suggestion (implemented in the PR I'll attach to this ticket) is to add a metric to {{ExternalShuffleBlockHandler.ShuffleMetrics}} which keeps track of how many times we called {{TransportChannelHandler#exceptionCaught}}. I think that this gives us the insight into request failure rates that we're currently missing, but obviously I'm open to alternatives as well if people have other ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org