You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanji84 <gi...@git.apache.org> on 2018/04/10 19:17:47 UTC
[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...
GitHub user yanji84 opened a pull request:
https://github.com/apache/spark/pull/21033
[SPARK-19320][MESOS][WIP]allow specifying a hard limit on number of gpus required in each spark executor when running on mesos
## What changes were proposed in this pull request?
Currently, Spark only allows specifying overall gpu resources as an upper limit, this adds a new conf parameter to allow specifying a hard limit on the number of gpu cores for each executor while still respecting the overall gpu resource constraint
## How was this patch tested?
Unit Testing
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yanji84/spark ji/hard_limit_on_gpu
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21033.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21033
----
commit cec434a1eba6227814ba5a842ff8f41103217539
Author: Ji Yan <ji...@...>
Date: 2017-03-10T05:30:11Z
respect both gpu and maxgpu
commit c427e151dbf63815f25d20fe1b099a7b09e85f51
Author: Ji Yan <ji...@...>
Date: 2017-05-14T20:02:16Z
fix gpu offer
commit 1e61996c31ff3a01396738fd91adf69952fd3558
Author: Ji Yan <ji...@...>
Date: 2017-05-14T20:15:55Z
syntax fix
commit f24dbe17787acecd4c032e25d820cb59d8b6d491
Author: Ji Yan <ji...@...>
Date: 2017-05-15T00:30:50Z
pass all tests
commit f89e5ccae02667d4f55e7aeb1f805a9cfaee1558
Author: Ji Yan <ji...@...>
Date: 2018-04-10T18:37:14Z
remove redundant
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21033: [SPARK-19320][MESOS]allow specifying a hard limit...
Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on a diff in the pull request:
https://github.com/apache/spark/pull/21033#discussion_r182283035
--- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala ---
@@ -495,9 +500,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
launchTasks = true
val taskId = newMesosTaskId()
val offerCPUs = getResource(resources, "cpus").toInt
- val taskGPUs = Math.min(
- Math.max(0, maxGpus - totalGpusAcquired), getResource(resources, "gpus").toInt)
-
+ val offerGPUs = getResource(resources, "gpus").toInt
+ var taskGPUs = executorGpus
--- End diff --
Ah, good point, I missed that earlier. @yanji84 Why are we changing the default behavior when `spark.mesos.executor.gpus` is not specified? Previously, if `spark.mesos.gpus.max` was set (without setting `spark.mesos.executor.gpus`), GPUs were allocated greedily. This aligns with the CPU behavior when `spark.executor.cores` is not specified. GPUs could be handled the same way.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21033
ping @yanji84
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:
https://github.com/apache/spark/pull/21033
any progress here ? @yanji84 @susanxhuynh
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by yanji84 <gi...@git.apache.org>.
Github user yanji84 commented on the issue:
https://github.com/apache/spark/pull/21033
Anything else do we need to do to merge in this change?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on the issue:
https://github.com/apache/spark/pull/21033
@yanji84 I tried to restore the previous behavior when `spark.mesos.executor.gpus` is not specified. Here's the commit in my fork: https://github.com/mesosphere/spark/pull/23/commits/5adb830bc630f4995470b08157d30016e3b4567e I restored the previous unit test as well. Seems to work okay. WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21033#discussion_r181199842
--- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala ---
@@ -165,18 +165,47 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite
}
- test("mesos does not acquire more than spark.mesos.gpus.max") {
- val maxGpus = 5
- setBackend(Map("spark.mesos.gpus.max" -> maxGpus.toString))
+ test("mesos acquires spark.mesos.executor.gpus number of gpus per executor") {
+ setBackend(Map("spark.mesos.gpus.max" -> "5",
+ "spark.mesos.executor.gpus" -> "2"))
val executorMemory = backend.executorMemory(sc)
- offerResources(List(Resources(executorMemory, 1, maxGpus + 1)))
+ offerResources(List(Resources(executorMemory, 1, 5)))
val taskInfos = verifyTaskLaunched(driver, "o1")
assert(taskInfos.length == 1)
val gpus = backend.getResource(taskInfos.head.getResourcesList, "gpus")
- assert(gpus == maxGpus)
+ assert(gpus == 2)
+ }
+
+
+ test("mesos declines offers that cannot satisfy spark.mesos.executor.gpus") {
+ setBackend(Map("spark.mesos.gpus.max" -> "5",
--- End diff --
I think it's worth testing setting max less than the number of executor gpus as well.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on the issue:
https://github.com/apache/spark/pull/21033
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21033: [SPARK-19320][MESOS]allow specifying a hard limit...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/21033#discussion_r181627739
--- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala ---
@@ -495,9 +500,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
launchTasks = true
val taskId = newMesosTaskId()
val offerCPUs = getResource(resources, "cpus").toInt
- val taskGPUs = Math.min(
- Math.max(0, maxGpus - totalGpusAcquired), getResource(resources, "gpus").toInt)
-
+ val offerGPUs = getResource(resources, "gpus").toInt
+ var taskGPUs = executorGpus
--- End diff --
so looks like we are changing the behavior for the value set in `spark.mesos.gpus.max` (since 2.1)? we are ok with that/that might break existing deployment? is there a migration guide for something like this?
in addition, is there other changes by default - specifically now taskGPUs defaults to 0?
also, should we warn if `spark.mesos.executor.gpus` is > `spark.mesos.gpus.max`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21033
Jenkins, ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21033
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...
Posted by yanji84 <gi...@git.apache.org>.
Github user yanji84 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21033#discussion_r181231118
--- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala ---
@@ -165,18 +165,47 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite
}
- test("mesos does not acquire more than spark.mesos.gpus.max") {
- val maxGpus = 5
- setBackend(Map("spark.mesos.gpus.max" -> maxGpus.toString))
+ test("mesos acquires spark.mesos.executor.gpus number of gpus per executor") {
+ setBackend(Map("spark.mesos.gpus.max" -> "5",
+ "spark.mesos.executor.gpus" -> "2"))
val executorMemory = backend.executorMemory(sc)
- offerResources(List(Resources(executorMemory, 1, maxGpus + 1)))
+ offerResources(List(Resources(executorMemory, 1, 5)))
val taskInfos = verifyTaskLaunched(driver, "o1")
assert(taskInfos.length == 1)
val gpus = backend.getResource(taskInfos.head.getResourcesList, "gpus")
- assert(gpus == maxGpus)
+ assert(gpus == 2)
+ }
+
+
+ test("mesos declines offers that cannot satisfy spark.mesos.executor.gpus") {
+ setBackend(Map("spark.mesos.gpus.max" -> "5",
--- End diff --
Sounds good. Added the test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on the issue:
https://github.com/apache/spark/pull/21033
@yanji84 Thanks for the patch. I tested your previous PR on GPUs running on DC/OS and everything worked fine. Would you mind updating the documentation as well - https://github.com/apache/spark/blob/master/docs/running-on-mesos.md?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21033
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21033
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...
Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on the issue:
https://github.com/apache/spark/pull/21033
LGTM. @yanji84 You may want to remove the "WIP" in the PR title.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:
https://github.com/apache/spark/pull/21033
@yanji84 How do you identify the gpu if you have multiple gpus on the machine ? It would nice to have some docs for it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:
https://github.com/apache/spark/pull/21033
ping @yanji84
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org