You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/08/05 00:47:00 UTC

[jira] [Assigned] (SPARK-39984) Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

     [ https://issues.apache.org/jira/browse/SPARK-39984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-39984:
------------------------------------

    Assignee: Apache Spark

> Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-39984
>                 URL: https://issues.apache.org/jira/browse/SPARK-39984
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Kai-Hsun Chen
>            Assignee: Apache Spark
>            Priority: Major
>
> Currently, the driver’s HeartbeatReceiver will expire an executor if it does not receive any heartbeat from the executor for 120 seconds. However, 120 seconds is too long, but we will face other challenges when we try to lower the timeout threshold. To elaborate, when an executor is performing GC, it cannot reply any message.
>  
> Hence, this PR aims to provide a method to lower the timeout. Worker will send heartbeats to master periodically, and thus if HeartbeatReceiver asks master the information about the latest heartbeat from the worker which the executor is on, HeartbeatReceiver can determine whether the heartbeat loss is caused by network issues or other issues (e.g. GC). If the heartbeat loss is not caused by network issues, the HeartbeatReceiver will put the executor into a waitingList rather than expiring it immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org