You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matthew Farrellee (JIRA)" <ji...@apache.org> on 2014/09/21 19:48:33 UTC

[jira] [Updated] (SPARK-604) reconnect if mesos slaves dies

     [ https://issues.apache.org/jira/browse/SPARK-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Farrellee updated SPARK-604:
------------------------------------
    Component/s: Mesos

> reconnect if mesos slaves dies
> ------------------------------
>
>                 Key: SPARK-604
>                 URL: https://issues.apache.org/jira/browse/SPARK-604
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>
> when running on mesos, if a slave goes down, spark doesn't try to reassign the work to another machine.  Even if the slave comes back up, the job is doomed.
> Currently when this happens, we just see this in the driver logs:
> 12/11/01 16:48:56 INFO mesos.MesosSchedulerBackend: Mesos slave lost: 201210312057-1560611338-5050-24091-52
> Exception in thread "Thread-346" java.util.NoSuchElementException: key not found: value: "201210312057-1560611338-5050-24091-52"
>     at scala.collection.MapLike$class.default(MapLike.scala:224)
>     at scala.collection.mutable.HashMap.default(HashMap.scala:43)
>     at scala.collection.MapLike$class.apply(MapLike.scala:135)
>     at scala.collection.mutable.HashMap.apply(HashMap.scala:43)
>     at spark.scheduler.cluster.ClusterScheduler.slaveLost(ClusterScheduler.scala:255)
>     at spark.scheduler.mesos.MesosSchedulerBackend.slaveLost(MesosSchedulerBackend.scala:275)
> 12/11/01 16:48:56 INFO mesos.MesosSchedulerBackend: driver.run() returned with code DRIVER_ABORTED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org