You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/24 17:20:29 UTC

[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590447561
 
 
   > General question about priority, I did not find much here [1].
   > How is the value of priority interpreted ?
   > Is it simply to "tag" requests ?
   > Or are higher priority requests 'prioritized' over lower priority requests from an application (to a queue) ?
   > 
   > How does it compare with [2] ? Will that be cleaner (using tags) ?
   > 
   > [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/Priority.html
   > 
   > [2] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/SchedulingRequest.html
   
   I don't think the Priority is documented very well at all. We ran into this issue with TEZ, where you can't have different container sizes within the same Priority.  A priority is as it sounds, higher priorities get allocated first. For Spark I don't think this matters since we finish a stage before proceeding to the next. If we had a slow start feature like MapReduce then it would be.  It does mean that if you have 2 stages with different resourceProfile running at the same time, one of those stages containers would be prioritized over the other, but again I don't think that is an issue. If you can think of a case it would be let me know.  There is actually a way to get around using different priorities but you have to turn on a feature in YARN to use like tags. Since that is optional feature I didn't want to rely on it and I didn't see any issues with the Priority.
   
   I haven't looked at the SchedulingRequest in detail but its more about placement and gang scheduling - https://issues.apache.org/jira/browse/YARN-6592. That is definitely something interesting but would prefer to do it separate from this, unless you see an issue with the Priority? I can look at it more to see if it would get around having to use Priority, but the schedulingRequest itself also has a priority, though has a separate resource sizing. I would almost bet it has the same restriction, but maybe its using the tags  to get around this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org