You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanji84 <gi...@git.apache.org> on 2018/04/10 19:17:47 UTC

[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...

GitHub user yanji84 opened a pull request:

    https://github.com/apache/spark/pull/21033

    [SPARK-19320][MESOS][WIP]allow specifying a hard limit on number of gpus required in each spark executor when running on mesos

    ## What changes were proposed in this pull request?
    
    Currently, Spark only allows specifying overall gpu resources as an upper limit, this adds a new conf parameter to allow specifying a hard limit on the number of gpu cores for each executor while still respecting the overall gpu resource constraint
    
    ## How was this patch tested?
    
    Unit Testing
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanji84/spark ji/hard_limit_on_gpu

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21033.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21033
    
----
commit cec434a1eba6227814ba5a842ff8f41103217539
Author: Ji Yan <ji...@...>
Date:   2017-03-10T05:30:11Z

    respect both gpu and maxgpu

commit c427e151dbf63815f25d20fe1b099a7b09e85f51
Author: Ji Yan <ji...@...>
Date:   2017-05-14T20:02:16Z

    fix gpu offer

commit 1e61996c31ff3a01396738fd91adf69952fd3558
Author: Ji Yan <ji...@...>
Date:   2017-05-14T20:15:55Z

    syntax fix

commit f24dbe17787acecd4c032e25d820cb59d8b6d491
Author: Ji Yan <ji...@...>
Date:   2017-05-15T00:30:50Z

    pass all tests

commit f89e5ccae02667d4f55e7aeb1f805a9cfaee1558
Author: Ji Yan <ji...@...>
Date:   2018-04-10T18:37:14Z

    remove redundant

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21033: [SPARK-19320][MESOS]allow specifying a hard limit...

Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21033#discussion_r182283035
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala ---
    @@ -495,9 +500,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
               launchTasks = true
               val taskId = newMesosTaskId()
               val offerCPUs = getResource(resources, "cpus").toInt
    -          val taskGPUs = Math.min(
    -            Math.max(0, maxGpus - totalGpusAcquired), getResource(resources, "gpus").toInt)
    -
    +          val offerGPUs = getResource(resources, "gpus").toInt
    +          var taskGPUs = executorGpus
    --- End diff --
    
    Ah, good point, I missed that earlier. @yanji84 Why are we changing the default behavior when `spark.mesos.executor.gpus` is not specified? Previously, if `spark.mesos.gpus.max` was set (without setting `spark.mesos.executor.gpus`), GPUs were allocated greedily. This aligns with the CPU behavior when `spark.executor.cores` is not specified. GPUs could be handled the same way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    ping @yanji84 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    any progress here ? @yanji84  @susanxhuynh 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by yanji84 <gi...@git.apache.org>.
Github user yanji84 commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    Anything else do we need to do to merge in this change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    @yanji84 I tried to restore the previous behavior when `spark.mesos.executor.gpus` is not specified. Here's the commit in my fork: https://github.com/mesosphere/spark/pull/23/commits/5adb830bc630f4995470b08157d30016e3b4567e I restored the previous unit test as well. Seems to work okay. WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21033#discussion_r181199842
  
    --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala ---
    @@ -165,18 +165,47 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite
       }
     
     
    -  test("mesos does not acquire more than spark.mesos.gpus.max") {
    -    val maxGpus = 5
    -    setBackend(Map("spark.mesos.gpus.max" -> maxGpus.toString))
    +  test("mesos acquires spark.mesos.executor.gpus number of gpus per executor") {
    +    setBackend(Map("spark.mesos.gpus.max" -> "5",
    +                   "spark.mesos.executor.gpus" -> "2"))
     
         val executorMemory = backend.executorMemory(sc)
    -    offerResources(List(Resources(executorMemory, 1, maxGpus + 1)))
    +    offerResources(List(Resources(executorMemory, 1, 5)))
     
         val taskInfos = verifyTaskLaunched(driver, "o1")
         assert(taskInfos.length == 1)
     
         val gpus = backend.getResource(taskInfos.head.getResourcesList, "gpus")
    -    assert(gpus == maxGpus)
    +    assert(gpus == 2)
    +  }
    +
    +
    +  test("mesos declines offers that cannot satisfy spark.mesos.executor.gpus") {
    +    setBackend(Map("spark.mesos.gpus.max" -> "5",
    --- End diff --
    
    I think it's worth testing setting max less than the number of executor gpus as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21033: [SPARK-19320][MESOS]allow specifying a hard limit...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21033#discussion_r181627739
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala ---
    @@ -495,9 +500,8 @@ private[spark] class MesosCoarseGrainedSchedulerBackend(
               launchTasks = true
               val taskId = newMesosTaskId()
               val offerCPUs = getResource(resources, "cpus").toInt
    -          val taskGPUs = Math.min(
    -            Math.max(0, maxGpus - totalGpusAcquired), getResource(resources, "gpus").toInt)
    -
    +          val offerGPUs = getResource(resources, "gpus").toInt
    +          var taskGPUs = executorGpus
    --- End diff --
    
    so looks like we are changing the behavior for the value set in `spark.mesos.gpus.max` (since 2.1)? we are ok with that/that might break existing deployment? is there a migration guide for something like this?
    
    in addition, is there other changes by default - specifically now taskGPUs defaults to 0?
    
    also, should we warn if `spark.mesos.executor.gpus` is > `spark.mesos.gpus.max`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    Jenkins, ok to test



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard ...

Posted by yanji84 <gi...@git.apache.org>.
Github user yanji84 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21033#discussion_r181231118
  
    --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala ---
    @@ -165,18 +165,47 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite
       }
     
     
    -  test("mesos does not acquire more than spark.mesos.gpus.max") {
    -    val maxGpus = 5
    -    setBackend(Map("spark.mesos.gpus.max" -> maxGpus.toString))
    +  test("mesos acquires spark.mesos.executor.gpus number of gpus per executor") {
    +    setBackend(Map("spark.mesos.gpus.max" -> "5",
    +                   "spark.mesos.executor.gpus" -> "2"))
     
         val executorMemory = backend.executorMemory(sc)
    -    offerResources(List(Resources(executorMemory, 1, maxGpus + 1)))
    +    offerResources(List(Resources(executorMemory, 1, 5)))
     
         val taskInfos = verifyTaskLaunched(driver, "o1")
         assert(taskInfos.length == 1)
     
         val gpus = backend.getResource(taskInfos.head.getResourcesList, "gpus")
    -    assert(gpus == maxGpus)
    +    assert(gpus == 2)
    +  }
    +
    +
    +  test("mesos declines offers that cannot satisfy spark.mesos.executor.gpus") {
    +    setBackend(Map("spark.mesos.gpus.max" -> "5",
    --- End diff --
    
    Sounds good. Added the test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    @yanji84 Thanks for the patch. I tested your previous PR on GPUs running on DC/OS and everything worked fine. Would you mind updating the documentation as well - https://github.com/apache/spark/blob/master/docs/running-on-mesos.md?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

Posted by susanxhuynh <gi...@git.apache.org>.
Github user susanxhuynh commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    LGTM. @yanji84 You may want to remove the "WIP" in the PR title.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by jomach <gi...@git.apache.org>.
Github user jomach commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    @yanji84  How do you identify the gpu if you have multiple gpus on the machine ? It would nice to have some docs for it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21033: [SPARK-19320][MESOS]allow specifying a hard limit on num...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21033
  
    ping @yanji84 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org