You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matthew Farrellee (JIRA)" <ji...@apache.org> on 2014/09/21 19:48:33 UTC
[jira] [Updated] (SPARK-604) reconnect if mesos slaves dies
[ https://issues.apache.org/jira/browse/SPARK-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthew Farrellee updated SPARK-604:
------------------------------------
Component/s: Mesos
> reconnect if mesos slaves dies
> ------------------------------
>
> Key: SPARK-604
> URL: https://issues.apache.org/jira/browse/SPARK-604
> Project: Spark
> Issue Type: Bug
> Components: Mesos
>
> when running on mesos, if a slave goes down, spark doesn't try to reassign the work to another machine. Even if the slave comes back up, the job is doomed.
> Currently when this happens, we just see this in the driver logs:
> 12/11/01 16:48:56 INFO mesos.MesosSchedulerBackend: Mesos slave lost: 201210312057-1560611338-5050-24091-52
> Exception in thread "Thread-346" java.util.NoSuchElementException: key not found: value: "201210312057-1560611338-5050-24091-52"
> at scala.collection.MapLike$class.default(MapLike.scala:224)
> at scala.collection.mutable.HashMap.default(HashMap.scala:43)
> at scala.collection.MapLike$class.apply(MapLike.scala:135)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:43)
> at spark.scheduler.cluster.ClusterScheduler.slaveLost(ClusterScheduler.scala:255)
> at spark.scheduler.mesos.MesosSchedulerBackend.slaveLost(MesosSchedulerBackend.scala:275)
> 12/11/01 16:48:56 INFO mesos.MesosSchedulerBackend: driver.run() returned with code DRIVER_ABORTED
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org