You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sun Rui (JIRA)" <ji...@apache.org> on 2016/09/13 07:37:21 UTC
[jira] [Created] (SPARK-17519) [MESOS] Enhance robustness when
ExternalShuffleService is broken
Sun Rui created SPARK-17519:
-------------------------------
Summary: [MESOS] Enhance robustness when ExternalShuffleService is broken
Key: SPARK-17519
URL: https://issues.apache.org/jira/browse/SPARK-17519
Project: Spark
Issue Type: Improvement
Components: Mesos
Affects Versions: 2.0.0
Reporter: Sun Rui
This is intended to be a complement to SPARK-17370 which addressed Standalone mode only.
For Mesos, it seems we could enhance MesosExternalShuffleClient to detect if any of the external shuffle services is lost when sending heartbeats. In such case, the MesosCoarseGrainedSchedulerBackend can notify ExecutorLost with workerlost=true. Also it can put the slave where the external shuffle service run to the blacklist, preventing launching tasks further on it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org