You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2019/05/21 20:14:00 UTC
[jira] [Commented] (SPARK-27495) Support Stage level resource configuration and scheduling

    [ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845219#comment-16845219 ] 

Thomas Graves commented on SPARK-27495:
---------------------------------------

I'm working though the design of this and there are definitely a lot of things to think about here.  I would like to get other peoples input before going much further.  

I think a few main points we need to decide on:

1) What resources can user specify per stage - the more I think about this, the more things I can think of people wanting to change.  For instance, normally in Spark you don't specify the task requirements, you specify the executor requirements and then possibly the cores per task.  So if someone is requesting resources per stage, I think we need a way to specify both task and executor requirements.  You could specify executor requirements based on task requirements like say I want 4 tasks per executor and then multiply the task requirements, but then you have things like overhead memory and users aren't used to specifying at task level, so I think its better to separate those out.  Then going beyond that, we know people want to limit the # of total running tasks per stage, looking at memory, there is offheap, heap, overhead memory. I can envision people wanting to change retries or shuffle parameters per stage.  Its definitely more stuff then just a few resources.

Basically its coming to a lot of things in SparkConf.    Now whether that is the interface we show to users or not is another question.  You could for instance let them pass an entire SparkConf in and set the configs they are used to setting and deal with that.  But then you have to error or ignore the configs we don't support dynamically changing and you have to deal with resolving conflicts on an unknown set of things if they have specified different confs for multiple operations that get combined into a single stage (ie like map.filter.groupby and they specified conflicting resources for map and groupby).  Or you could make an interface that only gives them specific options and keep adding to that as people request more things.  The latter I think is cleaner in some ways but is also less flexible and requires a new API vs possibly using the configs users are already used to.

2) API.  I think ideally each of the operators (RDD.*, Dataset.*, where *  is map, filter, groupby, sort, join, etc) would have an optional parameter to specify the resources you want to use.  I think this would make it clear to the user that for at least that operation these will be applied.  It also helps with cases you don't have an RDD yet, like on the initial read of files. This however could mean a lot of API changes. 

Another way, which was originally proposed in SPARK-24615,  is adding something like a .withResource api but then you have to deal with do you prefix it, post fix it, etc.  If you postfix it what about things like eager execution mode.  prefix seems to make more sense. But then you still don't have an option for the readers.   I think this also makes the scoping less clear, although you still have some of that with adding it to the individual operators.

3) Scoping.  The scoping could be confusing to the users.  Ideally I want to do RDD/Dataset/Data frame api's (I realize the Jira was initially more in the scope of the barrier scheduling, but if we are going to do it, it seems like we should make it generic). The RDD is a bit more obvious where the stage boundaries might be, but with catalyst it can do any sort of optimizations that could lead to stage boundaries the user doesn't expect.    In either case you also have cases where things are in 2 stages, like groupby where it does the partial aggregation, the shuffle, then the full aggregation.  The withResources would have to apply to both stages.   Then you have things like do you keep using that resource profile until they change it or is it just those stages and then it goes back to the default application level configs.  You could also go back to what I mentioned on the Jira where the withResources would be like a function scope {}...  withResources() \{   everything that should be done within that resource profile.... } but that syntax isn't like anything we have now that I'm aware of.

3) How to deal with multiple, possibly conflict resource requirements.  You can go with the max in some cases but for some cases that might not make sense. For instance if you are doing memory you might actually want them to be the sum, for instance you know this operations need x memory, then catalyst combines that with another operation that needs y memory.  You would want to sum those or you would have to have the user again realize those will get combined and have them do it.  The latter isn't ideal either.

Really, the dataframe/dataset api shouldn't need to have any api to specify resources, ideally catalyst figures it out.  For instance if it has a gpu and catalyst knows how to use it, it would just use it. Ideally it was smarter about how much memory it needs, etc as well, but that functionality isn't there yet and I think the way people will want to use this is many times with the dataset/data frame api if its going to support cpu/memory/etc.  

Let me know if anyone has input on the above issues?

> Support Stage level resource configuration and scheduling
> ---------------------------------------------------------
>
>                 Key: SPARK-27495
>                 URL: https://issues.apache.org/jira/browse/SPARK-27495
>             Project: Spark
>          Issue Type: Story
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Thomas Graves
>            Priority: Major
>
> Currently Spark supports CPU level scheduling and we are adding in accelerator aware scheduling with https://issues.apache.org/jira/browse/SPARK-24615, but both of those are scheduling via application level configurations.  Meaning there is one configuration that is set for the entire lifetime of the application and the user can't change it between Spark jobs/stages within that application.  
> Many times users have different requirements for different stages of their application so they want to be able to configure at the stage level what resources are required for that stage.
> For example, I might start a spark application which first does some ETL work that needs lots of cores to run many tasks in parallel, then once that is done I want to run some ML job and at that point I want GPU's, less CPU's, and more memory.
> With this Jira we want to add the ability for users to specify the resources for different stages.
> Note that https://issues.apache.org/jira/browse/SPARK-24615 had some discussions on this but this part of it was removed from that.
> We should come up with a proposal on how to do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org