You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/31 04:43:25 UTC

[jira] [Assigned] (SPARK-13112) CoarsedExecutorBackend register to driver should wait Executor was ready

     [ https://issues.apache.org/jira/browse/SPARK-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-13112:
------------------------------------

    Assignee:     (was: Apache Spark)

> CoarsedExecutorBackend register to driver should wait Executor was ready
> ------------------------------------------------------------------------
>
>                 Key: SPARK-13112
>                 URL: https://issues.apache.org/jira/browse/SPARK-13112
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0
>            Reporter: SuYan
>
> desc: 
> due to some host's disk are busy, it will results failed in timeoutException while executor try to register to shuffler server on that host... 
> and then it will exit(1) while launch task on a null executor.
> and yarn cluster resource are a little busy, yarn will thought that host is idle, it will prefer to allocate the same host executor, so it will have a chance that one task failed 4 times in the same host. 
> currently, CoarsedExecutorBackend register to driver first, and after registerDriver successful, then initial Executor. 
> if exception occurs in Executor initialization,
> But Driver don't know that event, will still launch task in that executor,
> then will call system.exit(1). 
> {code}
>  override def receive: PartialFunction[Any, Unit] = { 
>   case RegisteredExecutor(hostname) => 
>   logInfo("Successfully registered with driver") executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false) 
> ......
> case LaunchTask(data) =>
>    if (executor == null) {
>     logError("Received LaunchTask command but executor was null")        System.exit(1) 
> {code}
>  It is more reasonable to register with driver after Executor is ready... and make registerTimeout to be configurable...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org