You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/12/05 04:15:00 UTC

[jira] [Commented] (SPARK-40379) Propagate decommission executor loss reason during onDisconnect in K8s

    [ https://issues.apache.org/jira/browse/SPARK-40379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643114#comment-17643114 ] 

Apache Spark commented on SPARK-40379:
--------------------------------------

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38907

> Propagate decommission executor loss reason during onDisconnect in K8s
> ----------------------------------------------------------------------
>
>                 Key: SPARK-40379
>                 URL: https://issues.apache.org/jira/browse/SPARK-40379
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Holden Karau
>            Assignee: Holden Karau
>            Priority: Minor
>             Fix For: 3.4.0
>
>
> Currently if an executor has been sent a decommission message and then it disconnects from the scheduler we only disable the executor depending on the K8s status events to drive the rest of the state transitions. However, the K8s status events can become overwhelmed on large clusters so we should check if an executor is in a decommissioning state when it is disconnected and use that reason instead of waiting on the K8s status events so we have more accurate logging information.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org