You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "wbo4958 (via GitHub)" <gi...@apache.org> on 2023/09/21 12:02:03 UTC

[GitHub] [spark] wbo4958 opened a new pull request, #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

wbo4958 opened a new pull request, #43030:
URL: https://github.com/apache/spark/pull/43030

   ### What changes were proposed in this pull request?
   This PR is a follow-up of https://github.com/apache/spark/pull/37268 which supports stage level task resource profile for standalone cluster when dynamic allocation disabled. This PR enables stage-level task resource profile for yarn cluster.
   
   
   ### Why are the changes needed?
   
   Users who work on spark ML/DL cases running on Yarn would expect stage-level task resource profile feature.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   The current tests of https://github.com/apache/spark/pull/37268 can also cover this PR since both yarn and standalone cluster share the same TaskSchedulerImpl class which implements this feature. Apart from that, modifying the existing test to cover yarn cluster. 
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled [spark]

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1746835329

   thanks @mridulm  any throught/objections to pulling this back into 3.5 line?
   Seems fairly low risk improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1338220779


##########
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala:
##########
@@ -67,7 +68,7 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,
    */
   private[spark] def isSupported(rp: ResourceProfile): Boolean = {
     if (rp.isInstanceOf[TaskResourceProfile] && !dynamicEnabled) {
-      if ((notRunningUnitTests || testExceptionThrown) && !isStandaloneOrLocalCluster) {
+      if ((notRunningUnitTests || testExceptionThrown) && !supportTaskResourceProfile) {
         throw new SparkException("TaskResourceProfiles are only supported for Standalone " +

Review Comment:
   Done, Thx.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1338220364


##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +
       "cluster for now when dynamic allocation is disabled.")
+
+    // Local cluster: supports task resource profile.
+    conf.setMaster("local-cluster[1, 1, 1024]")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)
+
+    // Yarn: supports task resource profile.
+    conf.setMaster("yarn")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)

Review Comment:
   Done, Thx



##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +
       "cluster for now when dynamic allocation is disabled.")
+
+    // Local cluster: supports task resource profile.
+    conf.setMaster("local-cluster[1, 1, 1024]")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)

Review Comment:
   Done, Thx



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1740915018

   @HyukjinKwon  would you have any idea on the build failure? Looks like env is just not setup properly but I've never seen that before.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled [spark]

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1749017410

   thanks, I merged back into 3.5 branch (3.5.1) as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ivoson commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "ivoson (via GitHub)" <gi...@apache.org>.
ivoson commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1736609994

   > Hi @ivoson, I believe this PR is a follow-up of your previous PR #37268 since both yarn and standalone cluster share the same TaskSchedulerImpl. But I still would like to know what's your concern about this PR. Thx.
   
   Thanks @wbo4958 for ping me and work on this. No concerns about adding the support for yarn cluster. Please feel free to go ahead. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1737982792

   You still need to update the documentation like I mentioned here: https://github.com/apache/spark/pull/43030#issuecomment-1735586052
   
   Also need to look at the build failure, doesn't look like this code so maybe something with setup in your repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1744129146

   Merged to master.
   Thanks for working on this @wbo4958 !
   Thanks for the reviews @tgravescs and @ivoson :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1735586052

   Please update description to say what this PR does. Specifically this does introduce user facing changes because there is now a new feature available to users of yarn.
   
   Was this tested on a real cluster, what all tests were run other then unit tests?
   
   You also need to update the documentation: docs/configuration.md, docs/running-on-yarn.md


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1338219805


##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +

Review Comment:
   Done. Thx



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1729442066

   Hi @ivoson, I believe this PR is a follow-up of your previous PR https://github.com/apache/spark/pull/37268 since both yarn and standalone cluster share the same TaskSchedulerImpl. But I still would like to know what's your concern about this PR. Thx.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1337310074


##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +
       "cluster for now when dynamic allocation is disabled.")
+
+    // Local cluster: supports task resource profile.
+    conf.setMaster("local-cluster[1, 1, 1024]")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)
+
+    // Yarn: supports task resource profile.
+    conf.setMaster("yarn")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)
   }
 

Review Comment:
   it would be nice to add other tests here for yarn with TAskResourceProfile, like test:
     test("isSupported task resource profiles with dynamic allocation disabled") {
   which uses standalone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1337310074


##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +
       "cluster for now when dynamic allocation is disabled.")
+
+    // Local cluster: supports task resource profile.
+    conf.setMaster("local-cluster[1, 1, 1024]")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)
+
+    // Yarn: supports task resource profile.
+    conf.setMaster("yarn")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)
   }
 

Review Comment:
   it would be nice to add other tests here for yarn with TAskResourceProfile, like test:
     test("isSupported task resource profiles with dynamic allocation disabled") {
   which uses standalone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1736882909

   # Manual tests
   
   Due to the challenges of conducting yarn application tests within Spark unit tests, I took the initiative to manually perform several tests on our internal Yarn cluster.
   
   ## With dynamic allocation disabled.
   
   ``` bash
   spark-shell --master yarn --num-executors=1 --executor-cores=4 --conf spark.task.cpus=1 \
      --conf spark.dynamicAllocation.enabled=false
   ```
   
   The above command requires 1 executor with 4 CPU cores, and the default `task.cpus = 1`, so the default tasks parallelism is 4 at a time.
   
   1. `task.cores=1`
   
   Test code:
   
   ``` scala
   import org.apache.spark.resource.{ResourceProfileBuilder, TaskResourceRequests}
   
   val rdd = sc.range(0, 100, 1, 4)
   var rdd1 = rdd.repartition(3)
   
   val treqs = new TaskResourceRequests().cpus(1)
   val rp = new ResourceProfileBuilder().require(treqs).build
   
   rdd1 = rdd1.withResources(rp)
   rdd1.collect()
   ```
   
   When the required `task.cpus=1`, `executor.cores=4` (No executor resource specified, use the default one), there will be 4 tasks running for rp.
   
   The entire Spark application consists of a single Spark job that will be divided into two stages. The first shuffle stage comprises four tasks, all of which will be executed simultaneously.
   
   ![shuffle-stage](https://github.com/apache/spark/assets/1320706/19842abb-98b7-4384-8c3a-0c0d77513c70)
   
   And the second ResultStage comprises 3 tasks, and all of which will be executed simultaneously since the required `task.cpus` is  1.
   
   ![result-stage-task cores=1](https://github.com/apache/spark/assets/1320706/416ca856-90fd-4493-ab25-71acad669d6b)
   
   2. `task.cores=2`
   
   Test code,
   
   ``` scala
   import org.apache.spark.resource.{ResourceProfileBuilder, TaskResourceRequests}
   
   val rdd = sc.range(0, 100, 1, 4)
   var rdd1 = rdd.repartition(3)
   
   val treqs = new TaskResourceRequests().cpus(2)
   val rp = new ResourceProfileBuilder().require(treqs).build
   
   rdd1 = rdd1.withResources(rp)
   rdd1.collect()
   ```
   
   When the required `task.cpus=2`, `executor.cores=4` (No executor resource specified, use the default one), there will be 2 tasks running for rp.
   
   The first shuffle stage behaves the same as the first one. 
   
   The second ResultStage comprises 3 tasks, so the first 2 tasks will be running at a time, and then execute the last task.
   
   ![result-stage-task cores=2](https://github.com/apache/spark/assets/1320706/07fd7033-74fb-4653-9c2a-4a04892aea60)
   
   
   3. `task.cores=3`
   
   Test code,
   
   ``` scala
   import org.apache.spark.resource.{ResourceProfileBuilder, TaskResourceRequests}
   
   val rdd = sc.range(0, 100, 1, 4)
   var rdd1 = rdd.repartition(3)
   
   val treqs = new TaskResourceRequests().cpus(3)
   val rp = new ResourceProfileBuilder().require(treqs).build
   
   rdd1 = rdd1.withResources(rp)
   rdd1.collect()
   ```
   
   When the required `task.cpus=3`, `executor.cores=4` (No executor resource specified, use the default one), there will be 1 task running for rp.
   
   The first shuffle stage behaves the same as the first one. 
   
   The second ResultStage comprises 3 tasks, all of which will be running serially.
   
   ![result-stage-task cores=3](https://github.com/apache/spark/assets/1320706/bca6b376-4952-4bbb-bce7-e0a5fe2ca836)
   
   4. `task.cores=5`
   
   ``` scalas
   import org.apache.spark.resource.{ResourceProfileBuilder, TaskResourceRequests}
   
   val rdd = sc.range(0, 100, 1, 4)
   var rdd1 = rdd.repartition(3)
   val treqs = new TaskResourceRequests().cpus(5)
   val rp = new ResourceProfileBuilder().require(treqs).build
   
   rdd1 = rdd1.withResources(rp)
   ```
   
   exception happened.
   ``` console
   scala> rdd1 = rdd1.withResources(rp)
   org.apache.spark.SparkException: The number of cores per executor (=4) has to be >= the number of cpus per task = 5.
     at org.apache.spark.resource.ResourceUtils$.validateTaskCpusLargeEnough(ResourceUtils.scala:412)
     at org.apache.spark.resource.ResourceProfile.calculateTasksAndLimitingResource(ResourceProfile.scala:182)
     at org.apache.spark.resource.ResourceProfile.$anonfun$limitingResource$1(ResourceProfile.scala:152)
     at scala.Option.getOrElse(Option.scala:189)
     at org.apache.spark.resource.ResourceProfile.limitingResource(ResourceProfile.scala:151)
     at org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:141)
     at org.apache.spark.rdd.RDD.withResources(RDD.scala:1829)
     ... 50 elided
   
   scala> 
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1736912910

   ## With dynamic allocation enabled.
   
   ``` bash
   spark-shell --master yarn --num-executors=1 --executor-cores=4 --conf spark.task.cpus=1 \
     --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.maxExecutors=1\
   ```
   
   The above command enables the dynamic allocation and the max executors required is set to 1 in order to test.
   
   ### TaskResourceProfile without any specific executor request information
   
   Test code,
   
   ``` scala
   import org.apache.spark.resource.{ResourceProfileBuilder, TaskResourceRequests}
   
   val rdd = sc.range(0, 100, 1, 4)
   var rdd1 = rdd.repartition(3)
   
   val treqs = new TaskResourceRequests().cpus(3)
   val rp = new ResourceProfileBuilder().require(treqs).build
   
   rdd1 = rdd1.withResources(rp)
   rdd1.collect()
   ```
   
   The rp refers to the TaskResourceProfile without any specific executor request information, thus the executor information will utilize the default values from Default ResourceProfile (executor.cores=4).
   
   The above code will require an extra executor which will have the same `executor.cores/memory` as the default ResourceProfile.
   
   ![dynamic-no-exec-events](https://github.com/apache/spark/assets/1320706/c1591dbc-005f-4c09-8fec-86ba3bd54067)
   
   
   ![dynamc-no-exec-executors](https://github.com/apache/spark/assets/1320706/8845c1c0-eb8d-4bcb-8da2-dcb7b36e09d8)
   
   
   ![dynamic-no-exec-tasks](https://github.com/apache/spark/assets/1320706/e938df81-cecf-45f8-8533-fe4ee8f7a3cc)
   
   ### Different executor request information 
   
   ``` scala
   import org.apache.spark.resource.{ExecutorResourceRequests, ResourceProfileBuilder, TaskResourceRequests}
   
   val rdd = sc.range(0, 100, 1, 4)
   var rdd1 = rdd.repartition(3)
   
   val ereqs = new ExecutorResourceRequests().cores(6);
   val treqs = new TaskResourceRequests().cpus(5)
   
   val rp = new ResourceProfileBuilder().require(ereqs).require(treqs).build
   
   rdd1 = rdd1.withResources(rp)
   rdd1.collect()
   ```
   ![dynamice-events](https://github.com/apache/spark/assets/1320706/22f4f502-0c7b-4f22-ae5e-e2d6a1fea920)
   
   ![dynamic-executors](https://github.com/apache/spark/assets/1320706/ccb764d1-b555-4a3a-9dc1-5a63125f29e6)
   
   ![dynamic-tasks](https://github.com/apache/spark/assets/1320706/463f8bd2-46dc-49cb-ad54-a4496e2cb779)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1739370515

   Thx @tgravescs, I reran the build/tests, but they still kept failing. Let me rerun them again. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1746860366

   I did not backport it given it was an improvement, but don't have objections as such - as you said, it is low risk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1736445083

   Put it in draft mode until Kubernetes support is available. @tgravescs thx for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1736921245

   Hi @tgravescs @ivoson, Would you please help to review it again? Since it's hard to do the Yarn end 2 end tests on spark unit tests. I did some manual tests, please see the above comments. 
   
   BTW, this PR only supports Yarn since I'm not familiar with k8s for now. I will put up another PR for k8s. Thx for the understanding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm closed pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled
URL: https://github.com/apache/spark/pull/43030


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1337316673


##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +

Review Comment:
   this should match comment above about changing the text in exception



##########
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala:
##########
@@ -67,7 +68,7 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,
    */
   private[spark] def isSupported(rp: ResourceProfile): Boolean = {
     if (rp.isInstanceOf[TaskResourceProfile] && !dynamicEnabled) {
-      if ((notRunningUnitTests || testExceptionThrown) && !isStandaloneOrLocalCluster) {
+      if ((notRunningUnitTests || testExceptionThrown) && !supportTaskResourceProfile) {
         throw new SparkException("TaskResourceProfiles are only supported for Standalone " +

Review Comment:
   this text should change as it says only standalone mode which isn't true anymore.



##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +
       "cluster for now when dynamic allocation is disabled.")
+
+    // Local cluster: supports task resource profile.
+    conf.setMaster("local-cluster[1, 1, 1024]")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)

Review Comment:
   there is no assert saround this



##########
core/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala:
##########
@@ -126,18 +126,29 @@ class ResourceProfileManagerSuite extends SparkFunSuite {
     val defaultProf = rpmanager.defaultResourceProfile
     assert(rpmanager.isSupported(defaultProf))
 
-    // task resource profile.
+    // Standalone: supports task resource profile.
     val gpuTaskReq = new TaskResourceRequests().resource("gpu", 1)
     val taskProf = new TaskResourceProfile(gpuTaskReq.requests)
     assert(rpmanager.isSupported(taskProf))
 
+    // Local: doesn't support task resource profile.
     conf.setMaster("local")
     rpmanager = new ResourceProfileManager(conf, listenerBus)
     val error = intercept[SparkException] {
       rpmanager.isSupported(taskProf)
     }.getMessage
     assert(error === "TaskResourceProfiles are only supported for Standalone " +
       "cluster for now when dynamic allocation is disabled.")
+
+    // Local cluster: supports task resource profile.
+    conf.setMaster("local-cluster[1, 1, 1024]")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)
+
+    // Yarn: supports task resource profile.
+    conf.setMaster("yarn")
+    rpmanager = new ResourceProfileManager(conf, listenerBus)
+    rpmanager.isSupported(taskProf)

Review Comment:
   there is no assert around this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1340748033


##########
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala:
##########
@@ -57,6 +57,7 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,
   private val isStandaloneOrLocalCluster = master.isDefined && (
       master.get.startsWith("spark://") || master.get.startsWith("local-cluster")
     )
+  private val supportTaskResourceProfile = isStandaloneOrLocalCluster || isYarn

Review Comment:
   We should either inline it, or also check whether DRA is disabled - else the variable is inaccurate.
   
   ```suggestion
     private val supportTaskResourceProfile = isStandaloneOrLocalCluster || (isYarn && !dynamicEnabled)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on a diff in pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on code in PR #43030:
URL: https://github.com/apache/spark/pull/43030#discussion_r1341454021


##########
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala:
##########
@@ -57,6 +57,7 @@ private[spark] class ResourceProfileManager(sparkConf: SparkConf,
   private val isStandaloneOrLocalCluster = master.isDefined && (
       master.get.startsWith("spark://") || master.get.startsWith("local-cluster")
     )
+  private val supportTaskResourceProfile = isStandaloneOrLocalCluster || isYarn

Review Comment:
   Good suggestion. Talked with Tom, seems `inline` is a better choice, avoiding checking DRA twice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wbo4958 commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "wbo4958 (via GitHub)" <gi...@apache.org>.
wbo4958 commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1742397186

   Hi @tgravescs, Finally, I rebased this PR and forced-updated my PR, now the CI got passed. Thx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #43030: [SPARK-45250][core] Support stage level task resource profile for yarn cluster when dynamic allocation disabled

Posted by "tgravescs (via GitHub)" <gi...@apache.org>.
tgravescs commented on PR #43030:
URL: https://github.com/apache/spark/pull/43030#issuecomment-1739358502

   overall code looks good, we need to figure out why tests aren't running/passing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org