You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/12/14 10:18:01 UTC

[spark] branch branch-3.1 updated: [SPARK-33716][K8S] Fix potential race condition during pod termination

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new b44e650  [SPARK-33716][K8S] Fix potential race condition during pod termination
b44e650 is described below

commit b44e65042a8ea6cfd44796b83601d0a28beb4305
Author: Holden Karau <hk...@apple.com>
AuthorDate: Mon Dec 14 02:09:59 2020 -0800

    [SPARK-33716][K8S] Fix potential race condition during pod termination
    
    ### What changes were proposed in this pull request?
    
    Check that the pod state is not pending or running even if there is a deletion timestamp.
    
    ### Why are the changes needed?
    
    This can occur when the pod state and deletion timestamp are not updated by etcd in sync & we get a pod snapshot during an inconsistent view.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Manual testing with local version of Minikube on an overloaded computer that caused out of sync updates.
    
    Closes #30693 from holdenk/SPARK-33716-decommissioning-race-condition-during-pod-snapshot.
    
    Authored-by: Holden Karau <hk...@apple.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit bf2c88ccaebd8e27d9fc27c55c9955129541d3e1)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
index be75311..e81d213 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala
@@ -93,7 +93,8 @@ object ExecutorPodsSnapshot extends Logging {
       (
         pod.getStatus == null ||
         pod.getStatus.getPhase == null ||
-        pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "terminating"
+          (pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "terminating" &&
+           pod.getStatus.getPhase.toLowerCase(Locale.ROOT) != "running")
       ))
   }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org