You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/20 06:15:28 UTC

[GitHub] [spark] sjrand opened a new pull request #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in shuffle service's TransportChannelHandler

sjrand opened a new pull request #24645: [SPARK-27773][Shuffle] add metrics for number of exceptions caught in shuffle service's TransportChannelHandler
URL: https://github.com/apache/spark/pull/24645
 
 
   ## What changes were proposed in this pull request?
   
   Add a metric for number of exceptions caught in the external shuffle service's TransportChannelHandler, the idea being that spikes in this metric over some time window (or more desirably, the lack thereof) can be used as an indicator of the health of the shuffle service. (Where "health" refers to its ability to successfully respond to client requests.)
   
   ## How was this patch tested?
   
   Deployed a build of this PR to a YARN cluster, and confirmed that the NodeManagers' JMX metrics include `numCaughtExceptions`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org