You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "manpreet singh (Jira)" <ji...@apache.org> on 2023/02/06 19:31:00 UTC

[jira] [Commented] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage

    [ https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684895#comment-17684895 ] 

manpreet singh commented on SPARK-24942:
----------------------------------------

[~gurwls223]  Any updates on this? 

It seems like we are also facing this.

We want to use stage level scheduling with our jobs needing Barrier execution. If we cannot enable DRA,  then we will be incurring a huge infra cost for the spark pool which is no longer being used for the current stage.

 

> Improve cluster resource management with jobs containing barrier stage
> ----------------------------------------------------------------------
>
>                 Key: SPARK-24942
>                 URL: https://issues.apache.org/jira/browse/SPARK-24942
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Xingbo Jiang
>            Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r205652317
> We shall improve cluster resource management to address the following issues:
> - With dynamic resource allocation enabled, it may happen that we acquire some executors (but not enough to launch all the tasks in a barrier stage) and later release them due to executor idle time expire, and then acquire again.
> - There can be deadlock with two concurrent applications. Each application may acquire some resources, but not enough to launch all the tasks in a barrier stage. And after hitting the idle timeout and releasing them, they may acquire resources again, but just continually trade resources between each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org