You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Craig Ingram (JIRA)" <ji...@apache.org> on 2017/10/18 19:58:00 UTC
[jira] [Commented] (SPARK-21122) Address starvation issues when dynamic allocation is enabled

    [ https://issues.apache.org/jira/browse/SPARK-21122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16209918#comment-16209918 ] 

Craig Ingram commented on SPARK-21122:
--------------------------------------

Finally getting back around to this. Thanks for the feedback [~tgraves]. I agree with pretty much everything you pointed out. Newer versions of YARN do have a [Cluster Metrics API|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Metrics_API]. I think this was introduced in 2.8 though. Regardless I believe working with YARN's PreemptionMessage is a better path forward than what I was initially proposing. I wish there was an elegant way to do this generically, but I believe I can at least make an abstraction of the PreemptionMessage that can be used by other RM clients. For now, I will focus on a YARN specific solution.

> Address starvation issues when dynamic allocation is enabled
> ------------------------------------------------------------
>
>                 Key: SPARK-21122
>                 URL: https://issues.apache.org/jira/browse/SPARK-21122
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, YARN
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Craig Ingram
>         Attachments: Preventing Starvation with Dynamic Allocation Enabled.pdf
>
>
> When dynamic resource allocation is enabled on a cluster, it’s currently possible for one application to consume all the cluster’s resources, effectively starving any other application trying to start. This is particularly painful in a notebook environment where notebooks may be idle for tens of minutes while the user is figuring out what to do next (or eating their lunch). Ideally the application should give resources back to the cluster when monitoring indicates other applications are pending.
> Before delving into the specifics of the solution. There are some workarounds to this problem that are worth mentioning:
> * Set spark.dynamicAllocation.maxExecutors to a small value, so that users are unlikely to use the entire cluster even when many of them are doing work. This approach will hurt cluster utilization.
> * If using YARN, enable preemption and have each application (or organization) run in a separate queue. The downside of this is that when YARN preempts, it doesn't know anything about which executor it's killing. It would just as likely kill a long running executor with cached data as one that just spun up. Moreover, given a feature like https://issues.apache.org/jira/browse/SPARK-21097 (preserving cached data on executor decommission), YARN may not wait long enough between trying to gracefully and forcefully shut down the executor. This would mean the blocks that belonged to that executor would be lost and have to be recomputed.
> * Configure YARN to use the capacity scheduler with multiple scheduler queues. Put high-priority notebook users into a high-priority queue. Prevents high-priority users from being starved out by low-priority notebook users. Does not prevent users in the same priority class from starving each other.
> Obviously any solution to this problem that depends on YARN would leave other resource managers out in the cold. The solution proposed in this ticket will afford spark clusters the flexibly to hook in different resource allocation policies to fulfill their user's needs regardless of resource manager choice. Initially the focus will be on users in a notebook environment. When operating in a notebook environment with many users, the goal is fair resource allocation. Given that all users will be using the same memory configuration, this solution will focus primarily on fair sharing of cores.
> The fair resource allocation policy should pick executors to remove based on three factors initially: idleness, presence of cached data, and uptime. The policy will favor removing executors that are idle, short-lived, and have no cached data. The policy will only preemptively remove executors if there are pending applications or cores (otherwise the default dynamic allocation timeout/removal process is followed). The policy will also allow an application's resource consumption to expand based on cluster utilization. For example if there are 3 applications running but 2 of them are idle, the policy will allow a busy application with pending tasks to consume more than 1/3rd of the the cluster's resources.
> More complexity could be added to take advantage of task/stage metrics, histograms, and heuristics (i.e. favor removing executors running tasks that are quick). The important thing here is to benchmark effectively before adding complexity so we can measure the impact of the changes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org