You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/12/16 18:49:00 UTC

[jira] [Updated] (SPARK-40379) Propagate decommission executor loss reason during onDisconnect in K8s

     [ https://issues.apache.org/jira/browse/SPARK-40379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-40379:
----------------------------------
        Parent: SPARK-41550
    Issue Type: Sub-task  (was: Improvement)

> Propagate decommission executor loss reason during onDisconnect in K8s
> ----------------------------------------------------------------------
>
>                 Key: SPARK-40379
>                 URL: https://issues.apache.org/jira/browse/SPARK-40379
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Holden Karau
>            Assignee: Holden Karau
>            Priority: Minor
>             Fix For: 3.4.0
>
>
> Currently if an executor has been sent a decommission message and then it disconnects from the scheduler we only disable the executor depending on the K8s status events to drive the rest of the state transitions. However, the K8s status events can become overwhelmed on large clusters so we should check if an executor is in a decommissioning state when it is disconnected and use that reason instead of waiting on the K8s status events so we have more accurate logging information.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org