You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2016/11/21 07:29:58 UTC

[jira] [Commented] (SPARK-17755) Master may ask a worker to launch an executor before the worker actually got the response of registration

    [ https://issues.apache.org/jira/browse/SPARK-17755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682778#comment-15682778 ] 

Shixiong Zhu commented on SPARK-17755:
--------------------------------------

It's not easy to fix. The root cause is messages are sent via two different channels and their order is not guaranteed.

> Master may ask a worker to launch an executor before the worker actually got the response of registration
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17755
>                 URL: https://issues.apache.org/jira/browse/SPARK-17755
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Yin Huai
>
> I somehow saw a failed test {{org.apache.spark.DistributedSuite.caching in memory, serialized, replicated}}. Its log shows that Spark master asked the worker to launch an executor before the worker actually got the response of registration. So, the master knew that the worker had been registered. But, the worker did not know if it self had been registered. 
> {code}
> 16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Registering worker localhost:38262 with 1 cores, 1024.0 MB RAM
> 16/09/30 14:53:53.681 dispatcher-event-loop-0 INFO Master: Launching executor app-20160930145353-0000/1 on worker worker-20160930145353-localhost-38262
> 16/09/30 14:53:53.682 dispatcher-event-loop-3 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20160930145353-0000/1 on worker-20160930145353-localhost-38262 (localhost:38262) with 1 cores
> 16/09/30 14:53:53.683 dispatcher-event-loop-3 INFO StandaloneSchedulerBackend: Granted executor ID app-20160930145353-0000/1 on hostPort localhost:38262 with 1 cores, 1024.0 MB RAM
> 16/09/30 14:53:53.683 dispatcher-event-loop-0 WARN Worker: Invalid Master (spark://localhost:46460) attempted to launch executor.
> 16/09/30 14:53:53.687 worker-register-master-threadpool-0 INFO Worker: Successfully registered with master spark://localhost:46460
> {code}
> Then, seems the worker did not launch any executor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org