You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by tg...@apache.org on 2020/07/29 18:47:12 UTC

[spark] branch master updated: [SPARK-30322][DOCS] Add stage level scheduling docs

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e926d41  [SPARK-30322][DOCS] Add stage level scheduling docs
e926d41 is described below

commit e926d419d305c9400f6f2426ca3e8d04a9180005
Author: Thomas Graves <tg...@nvidia.com>
AuthorDate: Wed Jul 29 13:46:28 2020 -0500

    [SPARK-30322][DOCS] Add stage level scheduling docs
    
    ### What changes were proposed in this pull request?
    
    Document the stage level scheduling feature.
    
    ### Why are the changes needed?
    
    Document the stage level scheduling feature.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Documentation.
    
    ### How was this patch tested?
    
    n/a docs only
    
    Closes #29292 from tgravescs/SPARK-30322.
    
    Authored-by: Thomas Graves <tg...@nvidia.com>
    Signed-off-by: Thomas Graves <tg...@apache.org>
---
 docs/configuration.md   | 7 +++++++
 docs/running-on-yarn.md | 4 ++++
 2 files changed, 11 insertions(+)

diff --git a/docs/configuration.md b/docs/configuration.md
index abf7610..62799db 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3028,3 +3028,10 @@ There are configurations available to request resources for the driver: <code>sp
 Spark will use the configurations specified to first request containers with the corresponding resources from the cluster manager. Once it gets the container, Spark launches an Executor in that container which will discover what resources the container has and the addresses associated with each resource. The Executor will register with the Driver and report back the resources available to that Executor. The Spark scheduler can then schedule tasks to each Executor and assign specific reso [...]
 
 See your cluster manager specific page for requirements and details on each of - [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview), [Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview) and [Standalone Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It is currently not available with Mesos or local mode. And please also note that local-cluster mode with multiple workers is not supported(see Standalon [...]
+
+# Stage Level Scheduling Overview
+
+The stage level scheduling feature allows users to specify task and executor resource requirements at the stage level. This allows for different stages to run with executors that have different resources. A prime example of this is one ETL stage runs with executors with just CPUs, the next stage is an ML stage that needs GPUs. Stage level scheduling allows for user to request different executors that have GPUs when the ML stage runs rather then having to acquire executors with GPUs at th [...]
+This is only available for the RDD API in Scala, Java, and Python and requires dynamic allocation to be enabled.  It is only available on YARN at this time. See the [YARN](running-on-yarn.html#stage-level-scheduling-overview) page for more implementation details.
+
+See the `RDD.withResources` and `ResourceProfileBuilder` API's for using this feature. The current implementation acquires new executors for each `ResourceProfile`  created and currently has to be an exact match. Spark does not try to fit tasks into an executor that require a different ResourceProfile than the executor was created with. Executors that are not in use will idle timeout with the dynamic allocation logic. The default configuration for this feature is to only allow one Resour [...]
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 36d8f0b..6f7aaf2b 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -641,6 +641,10 @@ If the user has a user defined YARN resource, lets call it `acceleratorX` then t
 
 YARN does not tell Spark the addresses of the resources allocated to each container. For that reason, the user must specify a discovery script that gets run by the executor on startup to discover what resources are available to that executor. You can find an example scripts in `examples/src/main/scripts/getGpusResources.sh`. The script must have execute permissions set and the user should setup permissions to not allow malicious users to modify it. The script should write to STDOUT a JSO [...]
 
+# Stage Level Scheduling Overview
+
+Stage level scheduling is supported on YARN when dynamic allocation is enabled. One thing to note that is YARN specific is that each ResourceProfile requires a different container priority on YARN. The mapping is simply the ResourceProfile id becomes the priority, on YARN lower numbers are higher priority. This means that profiles created earlier will have a higher priority in YARN. Normally this won't matter as Spark finishes one stage before starting another one, the only case this mig [...]
+
 # Important notes
 
 - Whether core requests are honored in scheduling decisions depends on which scheduler is in use and how it is configured.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org