You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Attila Zsolt Piros (Jira)" <ji...@apache.org> on 2021/02/12 08:37:00 UTC

[jira] [Comment Edited] (SPARK-34389) Spark job on Kubernetes scheduled For Zero or less than minimum number of executors and Wait indefinitely under resource starvation

    [ https://issues.apache.org/jira/browse/SPARK-34389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282614#comment-17282614 ] 

Attila Zsolt Piros edited comment on SPARK-34389 at 2/12/21, 8:36 AM:
----------------------------------------------------------------------

[~ranju] the description of this config is a little bit misleading. Because the word "pending" has more meanings (a little bit overloaded). Here it means it is newly created and not processed by k8s which is different from the POD requests accepted by k8s and being in PENDING state.

So for this newly created pendings requests we have the timeout: 
 [https://github.com/apache/spark/blob/1fbd5764105e2c09caf4ab57a7095dd794307b02/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L126-L134]

And not for PENDING state pod requests:
 [https://github.com/apache/spark/blob/1fbd5764105e2c09caf4ab57a7095dd794307b02/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L193-L196]

 

I  can correct the description in one of my existing open PR (i.e [https://github.com/apache/spark/pull/31513]).


was (Author: attilapiros):
[~ranju] the description of this config is a little bit misleading. Because the word "pending" has more meanings (a little bit overloaded). Here it means it is newly created and not processed by k8s which is different from the POD requests accepted by k8s and being in PENDING state.

So for this newly created pendings requests we have the timeout: 
[https://github.com/apache/spark/blob/1fbd5764105e2c09caf4ab57a7095dd794307b02/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L126-L134]

And not for PENDING state pod requests:
[https://github.com/apache/spark/blob/1fbd5764105e2c09caf4ab57a7095dd794307b02/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L193-L196]

 

I  can correct the description in one of my existing open PR (i.e [https://github.com/apache/spark/pull/31513).|https://github.com/apache/spark/pull/31513)]

 

> Spark job on Kubernetes scheduled For Zero or less than minimum number of executors and Wait indefinitely under resource starvation
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34389
>                 URL: https://issues.apache.org/jira/browse/SPARK-34389
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.1
>            Reporter: Ranju
>            Priority: Major
>         Attachments: DriverLogs_ExecutorLaunchedLessThanMinExecutor.txt, Steps to reproduce.docx
>
>
> In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum number of executors , the executors goes in Pending State for indefinite time until the resource gets free.
> Suppose, Cluster Configurations are:
> total Memory=204Gi
> used Memory=200Gi
> free memory= 4Gi
> SPARK.EXECUTOR.MEMORY=10G
> SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
> SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
> Rather, the job should be cancelled if requested number of minimum executors are not available at that point of time because of resource unavailability.
> Currently it is doing partial scheduling or no scheduling and waiting indefinitely. And the job got stuck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org