You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2019/04/08 15:02:00 UTC

[jira] [Comment Edited] (SPARK-27364) User-facing APIs for GPU-aware scheduling

    [ https://issues.apache.org/jira/browse/SPARK-27364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812488#comment-16812488 ] 

Thomas Graves edited comment on SPARK-27364 at 4/8/19 3:01 PM:
---------------------------------------------------------------

There are 3 main user facing impacts for the user for this are the taskContext interface to fetch the resources, the user api to specify the gpu count, and then how the executor discovers the gpu's or is told the gpus. Below is more detail:

 

1) How the user gets the resources from the TaskContext and BarrierTaskContext

  For the taskContext interface I propose we add an api like:

*{color:#000080}def {color}getResources(): {color:#20999d}Map{color}[{color:#20999d}String{color}, ResourceInformation]*

Where the Map key is the resource type.  So examples would be "gpu", "fpga", etc.  "gpu" would be the only one we officially support to start with.

ResourceInformation would be a class with a name, units, count, and addresses.  The name would be "gpu", the units for gpu would be empty "", but for other resources types like memory it could be GiB or similar, the count is the number of them, so for gpu's it would be the number allocated, and finally the address Array of strings could be whatever we want, in the gpu case it would just be the indexes of the gpu's allocated to the task, ie ["0", "2", "3"]. I made this a string so its very flexible as to what the address is based on different resources types.  Now the user has to know how to inpret this, but depending on what you are doing with them even the same tools have multiple ways to specify. For instance with tensorflow{{ you can specify in CUDA_VISIBLE_DEVICES=2,3 or you can speicify like:
 for d in ['/device:GPU:2', '/device:GPU:3']:
 }}

*{color:#000080}private val {color}name: {color:#20999d}String{color},*
 *{color:#000080}private val {color}units: {color:#20999d}String{color},*
 *{color:#000080}private val {color}count: Long,*
 *{color:#000080}private val {color}addresses: Array[{color:#20999d}String{color}] = Array.empty*

*{color:#000080}def {color}getName(): {color:#20999d}String {color}= name*
 *{color:#000080}def {color}getUnits(): {color:#20999d}String {color}= units*
 *{color:#000080}def {color}getCount(): Long = count*
 *{color:#000080}def {color}getAddresses(): Array[{color:#20999d}String{color}] = addresses*

2) How the user specifies the gpu resources upon application submission

Here we need multiple configs:

   a) one for the user to specify the gpus per task, that config, to make it extensible for other resources, I propose: *spark.task.resource.\{resource type}.count* .  This implementation would only support gpu but it gives us flexibility to add more. This allows for multiple resources as well as multiple configs for that resource. For instance resource type here would be gpu, but you could add fpga.  It also would allow you to add more configs instead of count.  You could add in like type for I want a certain gpu type for instance.

   b) User has to specify how many gpu's per executor and driver.  This one is a bit more complicated since it has to work with the resource managers to actually acquire those but I think it makes sense to have common configs like we do for cores and memory. So we can have *spark.executor.resource.\{resource type}.count* and *spark.driver.resource.\{resource type}.count*.   This implementation would only support gpu.  The tricky thing here is some of the resource managers already have configs for asking for gpu's.  Yarn has {{spark.yarn.executor.resource.{resource-type}}} although it was added in 3.0 and hasn't shipped yet, but we can't just remove it since you could ask yarn for other resource types spark doesn't know about.  Kubernetes you have to request via the pod template so I think it would be on the user to make sure those match. mesos has {{spark.mesos.gpus.max}}.  So we just need to make sure the new configs maps into those and having the duplicate configs might make it a bit weird to the user.

3) how the executor discovers or is told the gpu resources it has.

Here I think we have 2 options for the user/resource manager.  

  a) I propose we add a config *spark.\{executor, driver}.resource.gpu.discoverScript* to allow the user to specify a discovery script. This script gets run when the executor starts and the user requested gpus to discover what Gpu's the executor has.   A simple example of this would be the script simply runs "nvidia-smi --query-gpu=index --format=csv,noheader'" to get the gpu indexes for nvidia cards.  You could make this script super simple or complicated depending on your setup.

  b) Also add an option to the executor launch *--gpuDevices* that allows the resource manager to specify the indexes of the gpu devices it has.   This allows insecure or non-containerized resource managers like standalone mode to allocate gpu's per executor without having containers and isolation all implemented.  We could try to make this more generic but seems like it could get complicated and the resource managers would have to be updated to support anyway, so am proposing its own gpu config for now.


was (Author: tgraves):
There are 3 main user facing impacts for the user for this are the taskContext interface to fetch the resources, the user api to specify the gpu count, and then how the executor discovers the gpu's or is told the gpus. Below is more detail:

 

1) How the user gets the resources from the TaskContext and BarrierTaskContext

  For the taskContext interface I propose we add an api like:

{color:#000080}def {color}getResources(): {color:#20999d}Map{color}[{color:#20999d}String{color}, ResourceInformation]

Where the Map key is the resource type.  So examples would be "gpu", "fpga", etc.  "gpu" would be the only one we officially support to start with.

ResourceInformation would be a class with a name, units, count, and addresses.  The name would be "gpu", the units for gpu would be empty "", but for other resources types like memory it could be GiB or similar, the count is the number of them, so for gpu's it would be the number allocated, and finally the address Array of strings could be whatever we want, in the gpu case it would just be the indexes of the gpu's allocated to the task, ie ["0", "2", "3"]. I made this a string so its very flexible as to what the address is based on different resources types.  Now the user has to know how to inpret this, but depending on what you are doing with them even the same tools have multiple ways to specify. For instance with tensorflow{{ you can specify in CUDA_VISIBLE_DEVICES=2,3 or you can speicify like:
for d in ['/device:GPU:2', '/device:GPU:3']:
}}

{color:#000080}private val {color}name: {color:#20999d}String{color},
{color:#000080}private val {color}units: {color:#20999d}String{color},
{color:#000080}private val {color}count: Long,
{color:#000080}private val {color}addresses: Array[{color:#20999d}String{color}] = Array.empty

{color:#000080}def {color}getName(): {color:#20999d}String {color}= name
 {color:#000080}def {color}getUnits(): {color:#20999d}String {color}= units
 {color:#000080}def {color}getCount(): Long = count
 {color:#000080}def {color}getAddresses(): Array[{color:#20999d}String{color}] = addresses



2) How the user specifies the gpu resources upon application submission

Here we need multiple configs:

   a) one for the user to specify the gpus per task, that config, to make it extensible for other resources, I propose: *spark.task.resource.\{resource type}.count* .  This implementation would only support gpu but it gives us flexibility to add more. This allows for multiple resources as well as multiple configs for that resource. For instance resource type here would be gpu, but you could add fpga.  It also would allow you to add more configs instead of count.  You could add in like type for I want a certain gpu type for instance.

   b) User has to specify how many gpu's per executor and driver.  This one is a bit more complicated since it has to work with the resource managers to actually acquire those but I think it makes sense to have common configs like we do for cores and memory. So we can have *spark.executor.resource.\{resource type}.count* and *spark.driver.resource.\{resource type}.count*.   This implementation would only support gpu.  The tricky thing here is some of the resource managers already have configs for asking for gpu's.  Yarn has {{spark.yarn.executor.resource.\{resource-type}}} although it was added in 3.0 and hasn't shipped yet, but we can't just remove it since you could ask yarn for other resource types spark doesn't know about.  Kubernetes you have to request via the pod template so I think it would be on the user to make sure those match. mesos has {{spark.mesos.gpus.max}}.  So we just need to make sure the new configs maps into those and having the duplicate configs might make it a bit weird to the user.

3) how the executor discovers or is told the gpu resources it has.

Here I think we have 2 options for the user/resource manager.  

  a) I propose we add a config *spark.\{executor, driver}.resource.gpu.discoverScript* to allow the user to specify a discovery script. This script gets run when the executor starts and the user requested gpus to discover what Gpu's the executor has.   A simple example of this would be the script simply runs "nvidia-smi --query-gpu=index --format=csv,noheader'" to get the gpu indexes for nvidia cards.  You could make this script super simple or complicated depending on your setup.

  b) Also add an option to the executor launch *--gpuDevices* that allows the resource manager to specify the indexes of the gpu devices it has.   This allows insecure or non-containerized resource managers like standalone mode to allocate gpu's per executor without having containers and isolation all implemented.  We could try to make this more generic but seems like it could get complicated and the resource managers would have to be updated to support anyway, so am proposing its own gpu config for now.

> User-facing APIs for GPU-aware scheduling
> -----------------------------------------
>
>                 Key: SPARK-27364
>                 URL: https://issues.apache.org/jira/browse/SPARK-27364
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Thomas Graves
>            Priority: Major
>
> Design and implement:
> * General guidelines for cluster managers to understand resource requests at application start. The concrete conf/param will be under the design of each cluster manager.
> * APIs to fetch assigned resources from task context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org