You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "hustfxj (JIRA)" <ji...@apache.org> on 2017/03/07 04:15:33 UTC
[jira] [Comment Edited] (SPARK-19831) Sending the heartbeat master
from worker maybe blocked by other rpc messages
[ https://issues.apache.org/jira/browse/SPARK-19831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898704#comment-15898704 ]
hustfxj edited comment on SPARK-19831 at 3/7/17 4:15 AM:
---------------------------------------------------------
[~zsxwing]. I only find the code which handles *ApplicationFinished* message is slow . So I also think such codes should be run in a separate thread. I will submit a PR which make the codes in a separate thread.
was (Author: hustfxj):
[~zsxwing]. I only find the code which handles *ApplicationFinished* message is slow until now. So I also think such codes should be run in a separate thread.
> Sending the heartbeat master from worker maybe blocked by other rpc messages
> ------------------------------------------------------------------------------
>
> Key: SPARK-19831
> URL: https://issues.apache.org/jira/browse/SPARK-19831
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: hustfxj
> Priority: Minor
>
> Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend *ThreadSafeRpcEndpoint*. If the heartbeat from a worker is blocked by the message *ApplicationFinished*, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again. So I think it is the bug on spark. It may solve this problem by the followed suggests:
> 1. It had better put the cleaning the application in a single asynchronous thread like 'cleanupThreadExecutor'. Thus it won't block other rpc messages like *SendHeartbeat*;
> 2. It had better not send the heartbeat master by Rpc channel. Because any other rpc message may block the rpc channel. It had better send the heartbeat master at an asynchronous timing thread .
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org