You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/14 16:24:19 UTC

[GitHub] [spark] tgravescs opened a new pull request #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

tgravescs opened a new pull request #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583
 
 
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Yarn side changes for Stage level scheduling.  The previous PR for dynamic allocation changes was https://github.com/apache/spark/pull/27053/files.
   
   Modified the data structures to store things on a per ResourceProfile basis. 
    I tried to keep the code changes to a minimum, the main loop that requests just goes through each Resourceprofile and the logic inside for each one stayed very close to the same. 
   On submission we now have to give each ResourceProfile a separate yarn Priority because yarn doesn't support asking for containers with different resources at the same Priority. We just use the profile id as the priority level. 
   Using a different Priority actually makes things easier when the containers come back to match them again which ResourceProfile they were requested for.
   The expectation is that yarn will only give you a container with resource amounts you requested or more. It should never give you a container if it doesn't satisfy your resource requests.  
   
   If you want to see the full feature changes you can look at https://github.com/apache/spark/pull/27053/files for reference
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   For stage level scheduling YARN support.
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If no, write 'No'.
   -->
   
   no
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Tested manually on YARN cluster and then unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586365349
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23196/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592297371
 
 
   **[Test build #119065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119065/testReport)** for PR 27583 at commit [`bd3509c`](https://github.com/apache/spark/commit/bd3509c044d3eca6895815a0e1347b97c1019f29).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592225272
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592418149
 
 
   Looks good to me, just had a minor query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586519803
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592541363
 
 
   test this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590993121
 
 
   Updated the locking to use synchronized everywhere and removed the concurrent structures since most of them were only being used by the metrics system since things have changed since originally added. I did also move some things around trying to put them in sections that were easier to read, if that is to confusing I can move things back.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs edited a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs edited a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590916778
 
 
   Sorry I forgot, actually the default profile is already the highest priority. In Yarn lower numbers are higher priority and default profile has id 0.  So my example above is wrong, job server would favor the default profiles over the custom ones, but seems that would be fine for default behavior and we can document it for now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592271464
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119056/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592328281
 
 
   **[Test build #119065 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119065/testReport)** for PR 27583 at commit [`bd3509c`](https://github.com/apache/spark/commit/bd3509c044d3eca6895815a0e1347b97c1019f29).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586458243
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23204/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-591058619
 
 
   **[Test build #118928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118928/testReport)** for PR 27583 at commit [`f9c1a05`](https://github.com/apache/spark/commit/f9c1a05cac745197f5f791d9def2f28a9e510e00).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385168310
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -166,69 +195,188 @@ private[yarn] class YarnAllocator(
 
   private val labelExpression = sparkConf.get(EXECUTOR_NODE_LABEL_EXPRESSION)
 
-  // A map to store preferred hostname and possible task numbers running on it.
-  private var hostToLocalTaskCounts: Map[String, Int] = Map.empty
-
-  // Number of tasks that have locality preferences in active stages
-  private[yarn] var numLocalityAwareTasks: Int = 0
-
   // A container placement strategy based on pending tasks' locality preference
   private[yarn] val containerPlacementStrategy =
-    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resource, resolver)
+    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resolver)
+
+  // The default profile is always present so we need to initialize the datastructures keyed by
+  // ResourceProfile id to ensure its present if things start running before a request for
+  // executors could add it. This approach is easier then going and special casing everywhere.
+  private def initDefaultProfile(): Unit = synchronized {
+    allocatedHostToContainersMapPerRPId(DEFAULT_RESOURCE_PROFILE_ID) =
+      new HashMap[String, mutable.Set[ContainerId]]()
+    runningExecutorsPerResourceProfileId.put(DEFAULT_RESOURCE_PROFILE_ID, mutable.HashSet[String]())
+    numExecutorsStartingPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = new AtomicInteger(0)
+    targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) =
+      SchedulerBackendUtils.getInitialTargetExecutorNumber(sparkConf)
+    rpIdToYarnResource(DEFAULT_RESOURCE_PROFILE_ID) = defaultResource
+    rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) =
+      ResourceProfile.getOrCreateDefaultProfile(sparkConf)
+  }
+
+  initDefaultProfile()
 
-  def getNumExecutorsRunning: Int = runningExecutors.size()
+  def getNumExecutorsRunning: Int = synchronized {
+    runningExecutorsPerResourceProfileId.values.map(_.size).sum
+  }
+
+  def getNumLocalityAwareTasks: Int = synchronized {
+    numLocalityAwareTasksPerResourceProfileId.values.sum
+  }
 
-  def getNumReleasedContainers: Int = releasedContainers.size()
+  def getNumExecutorsStarting: Int = {
 
 Review comment:
   I went through all the variables, they are all protected via a higher up call.  We can add in more synchonizes if we want to nest (re-entrant) it just to make it more readable?
   For instances this one is only called from allocateResources which is synchronized and that is the case with most of these.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586519806
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118447/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r380130421
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ##########
 @@ -227,6 +227,17 @@ private object ResourceRequestHelper extends Logging {
     resourceInformation
   }
 
+  def isYarnCustomResourcesNonEmpty(resource: Resource): Boolean = {
+    try {
+      // Use reflection as this uses APIs only available in Hadoop 3
 
 Review comment:
   For full functionality we are targetting, which is the minimum hadoop version ?
   Does hadoop 3 have all the wiring required for supporting gpu, accel cards, fpga, etc ? Or is it a subset of resources ?
   
   (This is not directly related to this pr, but was for my own understanding, given you should know this well :) ).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590995947
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385578038
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -336,7 +338,7 @@ private[yarn] class YarnAllocator(
         val resource = Resource.newInstance(totalMem, cores)
         ResourceRequestHelper.setResourceRequests(customResources.toMap, resource)
         logDebug(s"Created resource capability: $resource")
-        rpIdToYarnResource(rp.id) = resource
+        rpIdToYarnResource.putIfAbsent(rp.id, resource)
 
 Review comment:
   Can there be a race such that rp.id is present in the map ?
   And if it does, should we be overwriting it here ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592541960
 
 
   **[Test build #119093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119093/testReport)** for PR 27583 at commit [`bd3509c`](https://github.com/apache/spark/commit/bd3509c044d3eca6895815a0e1347b97c1019f29).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586427785
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592328646
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119065/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586427785
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592224835
 
 
   **[Test build #119056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119056/testReport)** for PR 27583 at commit [`14b6251`](https://github.com/apache/spark/commit/14b625150fa3d90e5bdafafc4775e8bf6abe4a7f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-591059437
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118928/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586365349
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23196/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586784460
 
 
   General question about priority, I did not find much here [1].
   How is the value of priority interpreted ?
   Is it simply to "tag" requests ?
   Or are higher priority requests 'prioritized' over lower priority requests from an application (to a queue) ?
   
   How does it compare with [2] ? Will that be cleaner (using tags) ?
   
   
   [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/Priority.html
   
   [2] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/SchedulingRequest.html

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586365329
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592031910
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23788/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592111429
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r383923052
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ##########
 @@ -227,6 +227,17 @@ private object ResourceRequestHelper extends Logging {
     resourceInformation
   }
 
+  def isYarnCustomResourcesNonEmpty(resource: Resource): Boolean = {
+    try {
+      // Use reflection as this uses APIs only available in Hadoop 3
 
 Review comment:
   You are correct on the behavior.  Many companies requested for this to work with their existing Hadoop installs (2.x where its < 2.10 or 3.1.1) and use the methods they are using with hadoop 2. I'm not trying to create a solution for everyone, just allow their existing solutions to work.
     In most cases I've heard they have like a GPU queue or node labels so they know they run on nodes with GPUs. After that different companies have different ways of doing the multi-tenancy. I've heard of some using file locking for instance. Or you could also put the GPUs in process exclusive mode and then just iterate over them to acquire a free one. The idea here is they can use whatever solution they already have. They can write a custom discovery script and I also added the ability to plugin a class if its easier to write Java code to do this.  https://issues.apache.org/jira/browse/SPARK-30689?filter=-2
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r383411304
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -142,22 +150,31 @@ private[yarn] class YarnAllocator(
   } else {
     0
   }
-  // Number of cores per executor.
-  protected val executorCores = sparkConf.get(EXECUTOR_CORES)
+  // Number of cores per executor for the default profile
+  protected val defaultExecutorCores = sparkConf.get(EXECUTOR_CORES)
 
 
 Review comment:
   sure let me take a look and see if I can simplify it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590995399
 
 
   **[Test build #118928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118928/testReport)** for PR 27583 at commit [`f9c1a05`](https://github.com/apache/spark/commit/f9c1a05cac745197f5f791d9def2f28a9e510e00).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592270743
 
 
   **[Test build #119056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119056/testReport)** for PR 27583 at commit [`14b6251`](https://github.com/apache/spark/commit/14b625150fa3d90e5bdafafc4775e8bf6abe4a7f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592328637
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590995961
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23676/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592297673
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23810/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592609070
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119093/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-591059427
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592225278
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23801/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-591059437
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118928/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592542685
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23838/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592609059
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592224835
 
 
   **[Test build #119056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119056/testReport)** for PR 27583 at commit [`14b6251`](https://github.com/apache/spark/commit/14b625150fa3d90e5bdafafc4775e8bf6abe4a7f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592542678
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586363779
 
 
   **[Test build #118439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118439/testReport)** for PR 27583 at commit [`6f2ace0`](https://github.com/apache/spark/commit/6f2ace032227a98df0ec1573e9a11c5aab0de449).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586427796
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118439/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590498676
 
 
   note that YARN-6592 only went into hadoop 3.1.0 so it wouldn't work for older versions, which might go back to your version question.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586454870
 
 
   @mridulm @felixcheung @kiszk 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592542685
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23838/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592542678
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590916034
 
 
   Sure we can make default profile highest priority. I put a note in the documentation jira as well to make sure to document the behavior.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586519806
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118447/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592111440
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119042/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592271459
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592031887
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r383405902
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ##########
 @@ -227,6 +227,17 @@ private object ResourceRequestHelper extends Logging {
     resourceInformation
   }
 
+  def isYarnCustomResourcesNonEmpty(resource: Resource): Boolean = {
+    try {
+      // Use reflection as this uses APIs only available in Hadoop 3
 
 Review comment:
   
   hadoop 3.1.1 has full gpu support, they backported some of it to hadoop 2.10 as well. I've tested the normal GPU scheduling feature with both of these as well as older hadoop 2.7 release. With older versions you can still ask Spark for GPUs but if yarn doesn't support it  doesn't ask yarn for it but Spark internally still tries to do it.  If you are running on nodes with GPUs spark will still use your discovery script to find them and assign them out. if the discovery script doesn't find a gpu and you asked for one then it fails. 
   
   This was actually a more recent change that I put in for gpu scheduling as more and more people were asking for support on older versions of hadoop because they don't plan on upgrading to hadoop 3 for a while. 
   
    I do need to test all that again with the stage level scheduling.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385196205
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -166,69 +195,188 @@ private[yarn] class YarnAllocator(
 
   private val labelExpression = sparkConf.get(EXECUTOR_NODE_LABEL_EXPRESSION)
 
-  // A map to store preferred hostname and possible task numbers running on it.
-  private var hostToLocalTaskCounts: Map[String, Int] = Map.empty
-
-  // Number of tasks that have locality preferences in active stages
-  private[yarn] var numLocalityAwareTasks: Int = 0
-
   // A container placement strategy based on pending tasks' locality preference
   private[yarn] val containerPlacementStrategy =
-    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resource, resolver)
+    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resolver)
+
+  // The default profile is always present so we need to initialize the datastructures keyed by
+  // ResourceProfile id to ensure its present if things start running before a request for
+  // executors could add it. This approach is easier then going and special casing everywhere.
+  private def initDefaultProfile(): Unit = synchronized {
+    allocatedHostToContainersMapPerRPId(DEFAULT_RESOURCE_PROFILE_ID) =
+      new HashMap[String, mutable.Set[ContainerId]]()
+    runningExecutorsPerResourceProfileId.put(DEFAULT_RESOURCE_PROFILE_ID, mutable.HashSet[String]())
+    numExecutorsStartingPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = new AtomicInteger(0)
+    targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) =
+      SchedulerBackendUtils.getInitialTargetExecutorNumber(sparkConf)
+    rpIdToYarnResource(DEFAULT_RESOURCE_PROFILE_ID) = defaultResource
+    rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) =
+      ResourceProfile.getOrCreateDefaultProfile(sparkConf)
+  }
+
+  initDefaultProfile()
 
-  def getNumExecutorsRunning: Int = runningExecutors.size()
+  def getNumExecutorsRunning: Int = synchronized {
+    runningExecutorsPerResourceProfileId.values.map(_.size).sum
+  }
+
+  def getNumLocalityAwareTasks: Int = synchronized {
+    numLocalityAwareTasksPerResourceProfileId.values.sum
+  }
 
-  def getNumReleasedContainers: Int = releasedContainers.size()
+  def getNumExecutorsStarting: Int = {
 
 Review comment:
   I went ahead an added in more synchronized call in each funciton those variables are touched. I believe the re-entrant of synchronized is cheap so shouldn't be much overhead and help wiht readability and future breakages. If this is not what you intended let me know

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592111440
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119042/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586456941
 
 
   **[Test build #118447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118447/testReport)** for PR 27583 at commit [`6f2ace0`](https://github.com/apache/spark/commit/6f2ace032227a98df0ec1573e9a11c5aab0de449).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590689441
 
 
   I was not advocating for SchedulingRequest, just wanted to understand whether the requirement matched what was supported by SchedulingRequest (though it was probably designed for something else, conceptually it seemed to apply based on my cursory read).
   
   Given the lack of availability in earlier hadoop versions, we can punt on using SchedulingRequest - something we can look at in future when minimum hadoop version changes.
   
   About priority - given it had scheduling semantics associated with it, I was not sure if overloading it would be a problem. I had not thought about jobserver usecase - but that is an excellent point !
   Given this, do we want to change priority of default to very high value ? Else all resource profiles will have a higher priority than default ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592541960
 
 
   **[Test build #119093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119093/testReport)** for PR 27583 at commit [`bd3509c`](https://github.com/apache/spark/commit/bd3509c044d3eca6895815a0e1347b97c1019f29).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592271459
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590447561
 
 
   > General question about priority, I did not find much here [1].
   > How is the value of priority interpreted ?
   > Is it simply to "tag" requests ?
   > Or are higher priority requests 'prioritized' over lower priority requests from an application (to a queue) ?
   > 
   > How does it compare with [2] ? Will that be cleaner (using tags) ?
   > 
   > [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/Priority.html
   > 
   > [2] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/SchedulingRequest.html
   
   I don't think the Priority is documented very well at all. We ran into this issue with TEZ, where you can't have different container sizes within the same Priority.  A priority is as it sounds, higher priorities get allocated first. For Spark I don't think this matters since we finish a stage before proceeding to the next. If we had a slow start feature like MapReduce then it would be.  It does mean that if you have 2 stages with different resourceProfile running at the same time, one of those stages containers would be prioritized over the other, but again I don't think that is an issue. If you can think of a case it would be let me know.  There is actually a way to get around using different priorities but you have to turn on a feature in YARN to use like tags. Since that is optional feature I didn't want to rely on it and I didn't see any issues with the Priority.
   
   I haven't looked at the SchedulingRequest in detail but its more about placement and gang scheduling - https://issues.apache.org/jira/browse/YARN-6592. That is definitely something interesting but would prefer to do it separate from this, unless you see an issue with the Priority? I can look at it more to see if it would get around having to use Priority, but the schedulingRequest itself also has a priority, though has a separate resource sizing. I would almost bet it has the same restriction, but maybe its using the tags  to get around this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586784460
 
 
   General question about priority, I did not find much here [1].
   How is the value of priority interpreted ?
   Is it simply to "tag" requests ?
   Or are higher priority requests 'prioritized' over lower priority requests from an application (to a queue) ?
   
   How does it compare with [2] ? Will that be cleaner ?
   
   
   [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/Priority.html
   
   [2] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/SchedulingRequest.html

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586365329
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586458243
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23204/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592735728
 
 
   thanks @mridulm, I appreciate the reviews. merged this to master

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586519803
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592271464
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119056/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592328637
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590915238
 
 
   @dongjoon-hyun  Sorry to bug you again, similar question here, how do I rerun the checks. I clicked on Details but I don't have any "rerun" button. I'm logged in with my github apache account.  Do I need permissions? or am I logged in wrong?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592110825
 
 
   **[Test build #119042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119042/testReport)** for PR 27583 at commit [`9e79f1a`](https://github.com/apache/spark/commit/9e79f1a7c63ff3df4bc592a67791b27c04b720f5).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r380132387
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -142,22 +150,31 @@ private[yarn] class YarnAllocator(
   } else {
     0
   }
-  // Number of cores per executor.
-  protected val executorCores = sparkConf.get(EXECUTOR_CORES)
+  // Number of cores per executor for the default profile
+  protected val defaultExecutorCores = sparkConf.get(EXECUTOR_CORES)
 
 
 Review comment:
   I am wondering if we want to make the locking semantics more formal in this class.
   Earlier, it was volatiles and concurrent hashmap (or sets back by concurrent hashmap) to eliminate need for locking - but lot of state changes were in context of 'this' being synchronized.
   
   Do we want to make sure all changes are guarded by a lock now ? Either use 'this' everywhere or some explicit private lock object and mark it via "\@GuardedBy"
   It is becoming slightly difficult too reason about the MT-safety of this class.
   Do you have any thoughts Tom ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586427796
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118439/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592607990
 
 
   **[Test build #119093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119093/testReport)** for PR 27583 at commit [`bd3509c`](https://github.com/apache/spark/commit/bd3509c044d3eca6895815a0e1347b97c1019f29).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385799231
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -336,7 +338,7 @@ private[yarn] class YarnAllocator(
         val resource = Resource.newInstance(totalMem, cores)
         ResourceRequestHelper.setResourceRequests(customResources.toMap, resource)
         logDebug(s"Created resource capability: $resource")
-        rpIdToYarnResource(rp.id) = resource
+        rpIdToYarnResource.putIfAbsent(rp.id, resource)
 
 Review comment:
   We changed rpIdToYarnResource to ConcurrentHashMap in commit 
   e89a8b5 above from mutable.HashMap ... wanted to make sure this was only for concurrent reads and not writes which might insert keys here in parallel.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590995947
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586363779
 
 
   **[Test build #118439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118439/testReport)** for PR 27583 at commit [`6f2ace0`](https://github.com/apache/spark/commit/6f2ace032227a98df0ec1573e9a11c5aab0de449).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592328646
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119065/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592031030
 
 
   **[Test build #119042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119042/testReport)** for PR 27583 at commit [`9e79f1a`](https://github.com/apache/spark/commit/9e79f1a7c63ff3df4bc592a67791b27c04b720f5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590916778
 
 
   Sorry actually the default profile is already the highest priority. In Yarn lower numbers are higher priority and default profile has id 0

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592225272
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586784460
 
 
   General question about priority, I did not find much here [1].
   How is the value of priority interpreted ?
   Is it simply to "tag" requests ?
   Or are higher priority requests 'prioritized' over lower priority requests from an application (to a queue) ?
   
   
   [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-api/apidocs/org/apache/hadoop/yarn/api/records/Priority.html

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592297673
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23810/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592225278
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23801/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590995961
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23676/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586456941
 
 
   **[Test build #118447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118447/testReport)** for PR 27583 at commit [`6f2ace0`](https://github.com/apache/spark/commit/6f2ace032227a98df0ec1573e9a11c5aab0de449).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-591059427
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385168310
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -166,69 +195,188 @@ private[yarn] class YarnAllocator(
 
   private val labelExpression = sparkConf.get(EXECUTOR_NODE_LABEL_EXPRESSION)
 
-  // A map to store preferred hostname and possible task numbers running on it.
-  private var hostToLocalTaskCounts: Map[String, Int] = Map.empty
-
-  // Number of tasks that have locality preferences in active stages
-  private[yarn] var numLocalityAwareTasks: Int = 0
-
   // A container placement strategy based on pending tasks' locality preference
   private[yarn] val containerPlacementStrategy =
-    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resource, resolver)
+    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resolver)
+
+  // The default profile is always present so we need to initialize the datastructures keyed by
+  // ResourceProfile id to ensure its present if things start running before a request for
+  // executors could add it. This approach is easier then going and special casing everywhere.
+  private def initDefaultProfile(): Unit = synchronized {
+    allocatedHostToContainersMapPerRPId(DEFAULT_RESOURCE_PROFILE_ID) =
+      new HashMap[String, mutable.Set[ContainerId]]()
+    runningExecutorsPerResourceProfileId.put(DEFAULT_RESOURCE_PROFILE_ID, mutable.HashSet[String]())
+    numExecutorsStartingPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = new AtomicInteger(0)
+    targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) =
+      SchedulerBackendUtils.getInitialTargetExecutorNumber(sparkConf)
+    rpIdToYarnResource(DEFAULT_RESOURCE_PROFILE_ID) = defaultResource
+    rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) =
+      ResourceProfile.getOrCreateDefaultProfile(sparkConf)
+  }
+
+  initDefaultProfile()
 
-  def getNumExecutorsRunning: Int = runningExecutors.size()
+  def getNumExecutorsRunning: Int = synchronized {
+    runningExecutorsPerResourceProfileId.values.map(_.size).sum
+  }
+
+  def getNumLocalityAwareTasks: Int = synchronized {
+    numLocalityAwareTasksPerResourceProfileId.values.sum
+  }
 
-  def getNumReleasedContainers: Int = releasedContainers.size()
+  def getNumExecutorsStarting: Int = {
 
 Review comment:
   I went through all the variables, they are all protected via a higher up call.  We can add in more synchonizes if we want to nest it just to make it more readable?
   For instances this one is only called from allocateResources which is synchronized and that is the case with most of these.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592031910
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23788/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590995399
 
 
   **[Test build #118928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118928/testReport)** for PR 27583 at commit [`f9c1a05`](https://github.com/apache/spark/commit/f9c1a05cac745197f5f791d9def2f28a9e510e00).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385729849
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -336,7 +338,7 @@ private[yarn] class YarnAllocator(
         val resource = Resource.newInstance(totalMem, cores)
         ResourceRequestHelper.setResourceRequests(customResources.toMap, resource)
         logDebug(s"Created resource capability: $resource")
-        rpIdToYarnResource(rp.id) = resource
+        rpIdToYarnResource.putIfAbsent(rp.id, resource)
 
 Review comment:
   no not at the moment anyway, this function is synchronized and no where else adds it so only one can run at a time. I put in putIfAbsent but it doesn't  really matter. ResourceProfile ids are unique and ResourceProfiles are immutable. Even if this code ran in multiple threads at the same time the result should be exactly the same so we would put the same thing in twice and it wouldn't matter which one got inserted first.
   Strictly speaking that doesn't need to be a concurrent hashmap due to locking of the calling functions but to be more strict on it and ot help with future changes I made it one.
   If you think its more clear one way or another let me know and I can modify.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592297663
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592243263
 
 
   @dongjoon-hyun  can you kick the check again and how do I get permissions - I don't see any rerun buttons?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592297663
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r384930920
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -166,69 +195,188 @@ private[yarn] class YarnAllocator(
 
   private val labelExpression = sparkConf.get(EXECUTOR_NODE_LABEL_EXPRESSION)
 
-  // A map to store preferred hostname and possible task numbers running on it.
-  private var hostToLocalTaskCounts: Map[String, Int] = Map.empty
-
-  // Number of tasks that have locality preferences in active stages
-  private[yarn] var numLocalityAwareTasks: Int = 0
-
   // A container placement strategy based on pending tasks' locality preference
   private[yarn] val containerPlacementStrategy =
-    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resource, resolver)
+    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resolver)
+
+  // The default profile is always present so we need to initialize the datastructures keyed by
+  // ResourceProfile id to ensure its present if things start running before a request for
+  // executors could add it. This approach is easier then going and special casing everywhere.
+  private def initDefaultProfile(): Unit = synchronized {
+    allocatedHostToContainersMapPerRPId(DEFAULT_RESOURCE_PROFILE_ID) =
+      new HashMap[String, mutable.Set[ContainerId]]()
+    runningExecutorsPerResourceProfileId.put(DEFAULT_RESOURCE_PROFILE_ID, mutable.HashSet[String]())
+    numExecutorsStartingPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = new AtomicInteger(0)
+    targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) =
+      SchedulerBackendUtils.getInitialTargetExecutorNumber(sparkConf)
+    rpIdToYarnResource(DEFAULT_RESOURCE_PROFILE_ID) = defaultResource
+    rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) =
+      ResourceProfile.getOrCreateDefaultProfile(sparkConf)
+  }
+
+  initDefaultProfile()
 
-  def getNumExecutorsRunning: Int = runningExecutors.size()
+  def getNumExecutorsRunning: Int = synchronized {
+    runningExecutorsPerResourceProfileId.values.map(_.size).sum
+  }
+
+  def getNumLocalityAwareTasks: Int = synchronized {
+    numLocalityAwareTasksPerResourceProfileId.values.sum
+  }
 
-  def getNumReleasedContainers: Int = releasedContainers.size()
+  def getNumExecutorsStarting: Int = {
 
 Review comment:
   synchronized on `this` ? I was expecting static analysis via \@GuardedBy to catch this in build, apparently we dont have that validation.
   Can you also check use of some of the other variables as well ? `targetNumExecutorsPerResourceProfileId`, etc also seems to have similar issues.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590456310
 
 
   I guess the one case I can think of is if you are running spark in a job server scenario the priorities could favor certain jobs more if they used  ResourceProfiles vs using the default profile. I think we could document this for now. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586455043
 
 
   test this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592031030
 
 
   **[Test build #119042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119042/testReport)** for PR 27583 at commit [`9e79f1a`](https://github.com/apache/spark/commit/9e79f1a7c63ff3df4bc592a67791b27c04b720f5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-590994474
 
 
   Note that most accesses are synchronized in allocateResources, the others places are separately synchronized and called either from  applicationmastersource or AMEndPoint

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586426640
 
 
   **[Test build #118439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118439/testReport)** for PR 27583 at commit [`6f2ace0`](https://github.com/apache/spark/commit/6f2ace032227a98df0ec1573e9a11c5aab0de449).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592609070
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119093/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592297371
 
 
   **[Test build #119065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119065/testReport)** for PR 27583 at commit [`bd3509c`](https://github.com/apache/spark/commit/bd3509c044d3eca6895815a0e1347b97c1019f29).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592180434
 
 
   introduced a bug fixing it

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r385196205
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
 ##########
 @@ -166,69 +195,188 @@ private[yarn] class YarnAllocator(
 
   private val labelExpression = sparkConf.get(EXECUTOR_NODE_LABEL_EXPRESSION)
 
-  // A map to store preferred hostname and possible task numbers running on it.
-  private var hostToLocalTaskCounts: Map[String, Int] = Map.empty
-
-  // Number of tasks that have locality preferences in active stages
-  private[yarn] var numLocalityAwareTasks: Int = 0
-
   // A container placement strategy based on pending tasks' locality preference
   private[yarn] val containerPlacementStrategy =
-    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resource, resolver)
+    new LocalityPreferredContainerPlacementStrategy(sparkConf, conf, resolver)
+
+  // The default profile is always present so we need to initialize the datastructures keyed by
+  // ResourceProfile id to ensure its present if things start running before a request for
+  // executors could add it. This approach is easier then going and special casing everywhere.
+  private def initDefaultProfile(): Unit = synchronized {
+    allocatedHostToContainersMapPerRPId(DEFAULT_RESOURCE_PROFILE_ID) =
+      new HashMap[String, mutable.Set[ContainerId]]()
+    runningExecutorsPerResourceProfileId.put(DEFAULT_RESOURCE_PROFILE_ID, mutable.HashSet[String]())
+    numExecutorsStartingPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = new AtomicInteger(0)
+    targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) =
+      SchedulerBackendUtils.getInitialTargetExecutorNumber(sparkConf)
+    rpIdToYarnResource(DEFAULT_RESOURCE_PROFILE_ID) = defaultResource
+    rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) =
+      ResourceProfile.getOrCreateDefaultProfile(sparkConf)
+  }
+
+  initDefaultProfile()
 
-  def getNumExecutorsRunning: Int = runningExecutors.size()
+  def getNumExecutorsRunning: Int = synchronized {
+    runningExecutorsPerResourceProfileId.values.map(_.size).sum
+  }
+
+  def getNumLocalityAwareTasks: Int = synchronized {
+    numLocalityAwareTasksPerResourceProfileId.values.sum
+  }
 
-  def getNumReleasedContainers: Int = releasedContainers.size()
+  def getNumExecutorsStarting: Int = {
 
 Review comment:
   I went ahead an added in more synchronized call in each funciton those variables are touched. I believe the re-entrant of synchronized is cheap so shouldn't be much overhead and help wiht readability and future breakages.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586458233
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586519032
 
 
   **[Test build #118447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118447/testReport)** for PR 27583 at commit [`6f2ace0`](https://github.com/apache/spark/commit/6f2ace032227a98df0ec1573e9a11c5aab0de449).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592111429
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592609059
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-592031887
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #27583: [SPARK-29149][YARN] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #27583: [SPARK-29149][YARN]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#discussion_r383662626
 
 

 ##########
 File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ##########
 @@ -227,6 +227,17 @@ private object ResourceRequestHelper extends Logging {
     resourceInformation
   }
 
+  def isYarnCustomResourcesNonEmpty(resource: Resource): Boolean = {
+    try {
+      // Use reflection as this uses APIs only available in Hadoop 3
 
 Review comment:
   
   Thanks for clarifying the behavior when YARN does support GPU, etc as a resource.
   
   I am probably missing something here, would be great to understand this better when YARN does not.
   Suppose I have a spark application, depending on some library which requires GPU (for example) and set corresponding resource profile expectations on the RDD's created (I am trying to make a case where app developer did not explicitly configure the resource profiles, but is implicitly leveraging them via some library).
   
   Now, if this application gets run on hadoop 2.7 (or anything before 2.10 as you mentioned), what will be the behavior ?
   If I understood it right :
   1) We will make requests to YARN without GPU's in the allocation request since YARN does not support it.
   2) On the nodes received, we will try to use the discovery script in assumption that GPU's are available - YARN is just oblivious about them. We will probably be using node-label constraint to ensure GPU availability ?
   3) If there are GPU's detected, we use them - else executor fails ?
   
   Is this right?
   If yes, how do we handle multi-tenancy on the executor host ? or choose which gpu(s) to use ?
   Is the assumption that in workloads like this, the entire node is reserved to prevent contention ? I am not sure if you have documented/detailed this somewhere and I missed it !

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27583: [SPARK-29149] Update YARN cluster manager For Stage Level Scheduling

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27583: [SPARK-29149]  Update YARN cluster manager For Stage Level Scheduling
URL: https://github.com/apache/spark/pull/27583#issuecomment-586458233
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org