You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2019/05/16 14:25:00 UTC
[jira] [Commented] (SPARK-27376) Design: YARN supports Spark GPU-aware scheduling

    [ https://issues.apache.org/jira/browse/SPARK-27376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841402#comment-16841402 ] 

Thomas Graves commented on SPARK-27376:
---------------------------------------

The design is pretty straight forward, there is really only 1 question which is consistency between the yarn resource configs and now the new spark resource configs, see the last paragraph for more details.

Require Hadoop 3.1 and > to get official GPU support.  Hadoop can be configured to use docker with isolation so that the containers yarn hands you back has the requested gpu's and other resources.  YARN does not give you information about what it allocated for gpu's, you have to discover it.  YARN has hardcoded resource types for fpga and gpu, anything else is user defined types. Spark 3.0 already added support for requesting any resource from YARN via the configs: spark.yarn.\{executor/driver/am}.resource, so the changes required for this Jira are simply to map the new spark configs: spark.\{executor/driver}.resource.\{fpga/gpu}.count into the corresponding yarn configs. For other resource types we can't map them though because we don't know what they are called on the yarn side.  So for any other resource they will have to specify both configs spark.yarn.\{executor/driver/am}.resource and spark.\{executor/driver}.resource.\{fpga/gpu}.  That isn't ideal but the only other option would be to have some sort of mapping the user would pass in.  We can always add more yarn resource types if it adds them. The main 2 people are interested in seem to be gpu and fpga anyway, so I think for now this is fine.

For versions < hadoop 3.1 it won't allocate based on GPU, so if they are using hadoop 2.7, 2.8, etc they could still allocate nodes with GPU, with yarn node labels or other hacks, and tell Spark the count and to auto discover them and Spark will pick up whatever it sees in the container - or really whatever the discoveryScript returns, so people could potentially write that script to match whatever hacks they have for sharing gpu nodes now.

The  flow from user point would be:

For GPU and FPGA: User will specify the spark.\{executor/driver}.resource.\{gpu/fpga}.count and the spark.\{executor/driver}.resource.\{gpu/fpga}.discoveryScript. The spark yarn code maps these into the corresponding yarn resource config and asks yarn for the containers.  Yarn allocates the containers and Spark will run the discovery script to figure out what it has for allocations.

For other resource types the user will have to specify:  spark.yarn.\{executor/driver/am}.resource and spark.\{executor/driver}.resource.\{gpu/fpga}.count and the spark.\{executor/driver}.resource.\{gpu/fpga}.discoveryScript.  

The only other thing that is a inconsistent is the spark.yarn.\{executor/driver/am}.resource configs don't  have a .count on the end. Right now that config takes a string as a value and splits that into an actual count and a unit. The yarn resource configs were just added in 3.0 so haven't been released so we could potentially change them.  We could change the spark user facing configs ( spark.\{executor/driver}.resource.\{gpu/fpga}.count) to be similar to make it easier for the user to specify both a count and unit in 1 config instead of 2, but I like the ability to separate them on the discovery side as well. We took  the .unit support out in the executor pull request so it isn't there right now anyway.  We could do the opposite and change the yarn ones to have a .count and .unit as well just to make things consistent but that makes user have to specify 2 instead of 1.  Or the third option would be to have the .count and .unit and then eventually have a third one that lets the user specify them together if we add resources that actually use it.

My thoughts are  for the user facing configs we change .count to be .amount and let the user specify units on it. This makes it easier for the user and it allows us to extend later if we want. I think we should also change the spark.yarn configs to have a .amount because yarn has already added other things like tags and attributes so we if want to extend the spark support for those it makes more sense to have those as another postfix option spark.yarn...resource.tags=

We can leave everything else that is internal as separate count and units and since gpu/fpga don't need units we don't need to actually add it to our ResourceInformation since we already removed it. 

 

> Design: YARN supports Spark GPU-aware scheduling
> ------------------------------------------------
>
>                 Key: SPARK-27376
>                 URL: https://issues.apache.org/jira/browse/SPARK-27376
>             Project: Spark
>          Issue Type: Sub-task
>          Components: YARN
>    Affects Versions: 3.0.0
>            Reporter: Xiangrui Meng
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org