You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xingbo Jiang (JIRA)" <ji...@apache.org> on 2019/02/27 16:47:00 UTC
[jira] [Commented] (SPARK-27005) Design sketch: Accelerator-aware scheduling

    [ https://issues.apache.org/jira/browse/SPARK-27005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779503#comment-16779503 ] 

Xingbo Jiang commented on SPARK-27005:
--------------------------------------

*API Changes [draft pending design discussion]*
class RDD[T] {

    /** Indicate resources requirement on computing the RDD. */
    def requireResources(numCores: Int, accelerators: Map[String, Int]): RDD[T] = ???

}

class TaskContext {

    /** Indexes of accelerators allocated to this task. **/
    def accelerators(): Seq[Int] = ???

}

/** Resource requirements for each task. **/
case class TaskResourceRequirements(
    numCores: Int,
    accelerators: Map[String, Int] = Map.empty)

*Design Sketch*

*Task Resource Requirements*
We use a case class TaskResourceRequirements to represent resource requirements for each task, it contains the messages that the number of cores required and a map of accelerator resources requirements. Users can change the message from RDD API, then the class is generated from the RDD chain in DAGScheduler, and finally passed to TaskScheduler.

*spark.task.cpus and spark.task.gpus*
Add a new config spark.task.gpus to specify the default number of GPUs required per task. This config is used similar to spark.task.cpus, if user doesn’t specify task resource requirements from RDD/PandasUDF API, then spark.task.cpus and spark.task.gpus are used as default value.

CPUS_PER_TASK(spark.task.cpus) is a global config with int value to specify the number of cores each task shall be assigned. Since we make task resource requirement a per-stage config, to keep backward compatibility of CPUS_PER_TASK, we shall change its default value to 1 core and empty accelerator resources, and make it the default resource requirements for each RDD unless override or user specifies.

*Expand RDD/Stage to support GPU*
Recursive search for GPU requirements in RDD chains in the same stage, put the requirements into Stage/Task.

*Expand SchedulerBackend to manage resources*
Update the RegisterExecutor message to carry accelerator resources an executor provides, thus SchedulerBackend can init the ExecutorData correctly. SchedulerBackend can allocate and recycle resources according to Task status updates it receives.

*Manage accelerator resources in Worker*
Since we assume homogeneous work resources, the accelerator resources info can be read from a global conf file. The Worker can use a map to store available accelerator resources internally. Similar to `allocateWorkerResourceToExecutors()`, it can assign accelerator resources to executors. The accelerator resources map shall get updated on message LaunchExecutor and ExecutorStateChanged.

*Expand TaskScheduler to support GPU*
We shall keep a separated queue to store the pending tasks that have non-empty accelerator resources requirements in TaskSetManager, thus when the WorkOffers contains accelerator resources, we can match the offers with the special task queue first, thus we can avoid allocate tasks that only require CPUs on a node with accelerators. If the submitted job don’t require accelerator resources, then the scheduling behavior and efficiency shall be the same as previously.

*Return GPU index from TaskContext*
On TaskContext creation, we shall allocate free GPU index(s) to the context, so we can avoid collisions.

*YARN Support*
User can request GPU resources in the Spark application via spark-submit, the application with GPU resources can be launched useing YARN+Docker, so user can easily define the DL environment in the Dockerfile.

Spark need to upgrade YARN to 3.1.2+ to enable GPU support, it support the following features:
* Auto discovery of GPU resources.
* GPU isolation at process level.
* Placement constraints.
* Heterogeneous device types via node labels.

*Kubernetes Support*
User can specify GPU requirements for the Spark application on Kubernetes by the following possible choices:
spark-submit w/ the same GPU configs used by standalone/YARN.
spark-submit w/ pod template (new feature for Spark 3.0).
Spark-submit w/ mutating webhook confs to modify pods at runtime.

User can run Spark jobs on Kubernetes using nvidia-docker to access GPUs, Kubernetes also support the following features:
* Auto discovery of GPU resources.
* GPU isolation at executor pod level.
* Placement constraints via node selectors.
* Heterogeneous device types via node labels.

> Design sketch: Accelerator-aware scheduling
> -------------------------------------------
>
>                 Key: SPARK-27005
>                 URL: https://issues.apache.org/jira/browse/SPARK-27005
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xingbo Jiang
>            Priority: Major
>
> This task is to outline a design sketch for the accelerator-aware scheduling SPIP discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org