You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sebastienrainville <gi...@git.apache.org> on 2016/01/26 15:59:09 UTC

[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

GitHub user sebastienrainville opened a pull request:

    https://github.com/apache/spark/pull/10924

    [SPARK-13001] [CORE] [MESOS] Prevent getting offers when reached max cores

    Similar to https://github.com/apache/spark/pull/8639
    
    This change rejects offers for 120s when reached `spark.cores.max` in coarse-grained mode to mitigate offer starvation. This prevents Mesos to send us offers again and again, starving other frameworks. This is especially problematic when running many small frameworks on the same Mesos cluster, e.g. many small Sparks streaming jobs, and cause the bigger spark jobs to stop receiving offers. By rejecting the offers for a long period of time, they become available to those other frameworks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sebastienrainville/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10924
    
----
commit 181a6efbd8c3c2d620107b65db163015b4f35b39
Author: Sebastien Rainville <se...@hopper.com>
Date:   2016-01-23T04:11:32Z

    [SPARK-13001] [CORE] [MESOS] Prevent getting offers in coarse-grained mode when reached max cores

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176343159
  
    **[Test build #50288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50288/consoleFull)** for PR 10924 at commit [`181a6ef`](https://github.com/apache/spark/commit/181a6efbd8c3c2d620107b65db163015b4f35b39).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-178199062
  
    @tnachen @dragos


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216674404
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-217017724
  
    **[Test build #57794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57794/consoleFull)** for PR 10924 at commit [`5b55ae0`](https://github.com/apache/spark/commit/5b55ae01085913743a95fcac8223d7917db0a617).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r53116209
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -254,53 +258,65 @@ private[spark] class CoarseMesosSchedulerBackend(
             val cpus = getResource(offer.getResourcesList, "cpus").toInt
             val id = offer.getId.getValue
             if (meetsConstraints) {
    -          if (taskIdToSlaveId.size < executorLimit &&
    -              totalCoresAcquired < maxCores &&
    -              mem >= calculateTotalMemory(sc) &&
    -              cpus >= 1 &&
    -              failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    -              !slaveIdsWithExecutors.contains(slaveId)) {
    -            // Launch an executor on the slave
    -            val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    -            totalCoresAcquired += cpusToUse
    -            val taskId = newMesosTaskId()
    -            taskIdToSlaveId.put(taskId, slaveId)
    -            slaveIdsWithExecutors += slaveId
    -            coresByTaskId(taskId) = cpusToUse
    -            // Gather cpu resources from the available resources and use them in the task.
    -            val (remainingResources, cpuResourcesToUse) =
    -              partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    -            val (_, memResourcesToUse) =
    -              partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    -            val taskBuilder = MesosTaskInfo.newBuilder()
    -              .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    -              .setSlaveId(offer.getSlaveId)
    -              .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    -              .setName("Task " + taskId)
    -              .addAllResources(cpuResourcesToUse.asJava)
    -              .addAllResources(memResourcesToUse.asJava)
    -
    -            sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    -              MesosSchedulerBackendUtil
    -                .setupContainerBuilderDockerInfo(image, sc.conf, taskBuilder.getContainerBuilder())
    +          if (totalCoresAcquired < maxCores) {
    +            if (taskIdToSlaveId.size < executorLimit &&
    +                mem >= calculateTotalMemory(sc) &&
    +                cpus >= 1 &&
    +                failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    +                !slaveIdsWithExecutors.contains(slaveId)) {
    +              // Launch an executor on the slave
    +              val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    +              totalCoresAcquired += cpusToUse
    +              val taskId = newMesosTaskId()
    +              taskIdToSlaveId.put(taskId, slaveId)
    +              slaveIdsWithExecutors += slaveId
    +              coresByTaskId(taskId) = cpusToUse
    +              // Gather cpu resources from the available resources and use them in the task.
    +              val (remainingResources, cpuResourcesToUse) =
    +                partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    +              val (_, memResourcesToUse) =
    +                partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    +              val taskBuilder = MesosTaskInfo.newBuilder()
    +                .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    +                .setSlaveId(offer.getSlaveId)
    +                .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    +                .setName("Task " + taskId)
    +                .addAllResources(cpuResourcesToUse.asJava)
    +                .addAllResources(memResourcesToUse.asJava)
    +
    +              sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    +                MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image, sc.conf,
    +                  taskBuilder.getContainerBuilder())
    +              }
    +
    +              // Accept the offer and launch the task
    +              logDebug(s"Accepting offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    +              d.launchTasks(
    +                Collections.singleton(offer.getId),
    +                Collections.singleton(taskBuilder.build()), filters)
    +            } else {
    +              // Decline the offer
    +              logDebug(s"Declining offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              d.declineOffer(offer.getId)
                 }
    -
    -            // Accept the offer and launch the task
    -            logDebug(s"Accepting offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    -            d.launchTasks(
    -              Collections.singleton(offer.getId),
    -              Collections.singleton(taskBuilder.build()), filters)
               } else {
    -            // Decline the offer
    -            logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            d.declineOffer(offer.getId)
    +            // We reached the maximum number of cores for this framework. We don't need to see
    +            // new offers. Decline the offer for a long period of time.
    +            logDebug(s"Declining offer (reached max cores): $id with attributes:" +
    +              s" $offerAttributes mem: $mem cpu: $cpus" +
    +              s" for $rejectOfferDurationForReachedMaxCores seconds")
    +            d.declineOffer(offer.getId, Filters.newBuilder()
    +              .setRefuseSeconds(rejectOfferDurationForReachedMaxCores).build())
    --- End diff --
    
    I agree and proposed this before, not sure we want to do this change in this patch. I'm thinking perhaps we can get all the smal changes we want in a reasonable way and refactor all three schedulers (or remove one). What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-184743925
  
    - could we have only one rejection delay setting?
    - why not add the same logic in fine-grained mode as well?
    
    ..and sorry for the delay in reviewing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r53029744
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -254,53 +258,65 @@ private[spark] class CoarseMesosSchedulerBackend(
             val cpus = getResource(offer.getResourcesList, "cpus").toInt
             val id = offer.getId.getValue
             if (meetsConstraints) {
    -          if (taskIdToSlaveId.size < executorLimit &&
    -              totalCoresAcquired < maxCores &&
    -              mem >= calculateTotalMemory(sc) &&
    -              cpus >= 1 &&
    -              failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    -              !slaveIdsWithExecutors.contains(slaveId)) {
    -            // Launch an executor on the slave
    -            val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    -            totalCoresAcquired += cpusToUse
    -            val taskId = newMesosTaskId()
    -            taskIdToSlaveId.put(taskId, slaveId)
    -            slaveIdsWithExecutors += slaveId
    -            coresByTaskId(taskId) = cpusToUse
    -            // Gather cpu resources from the available resources and use them in the task.
    -            val (remainingResources, cpuResourcesToUse) =
    -              partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    -            val (_, memResourcesToUse) =
    -              partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    -            val taskBuilder = MesosTaskInfo.newBuilder()
    -              .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    -              .setSlaveId(offer.getSlaveId)
    -              .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    -              .setName("Task " + taskId)
    -              .addAllResources(cpuResourcesToUse.asJava)
    -              .addAllResources(memResourcesToUse.asJava)
    -
    -            sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    -              MesosSchedulerBackendUtil
    -                .setupContainerBuilderDockerInfo(image, sc.conf, taskBuilder.getContainerBuilder())
    +          if (totalCoresAcquired < maxCores) {
    +            if (taskIdToSlaveId.size < executorLimit &&
    +                mem >= calculateTotalMemory(sc) &&
    +                cpus >= 1 &&
    +                failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    +                !slaveIdsWithExecutors.contains(slaveId)) {
    +              // Launch an executor on the slave
    +              val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    +              totalCoresAcquired += cpusToUse
    +              val taskId = newMesosTaskId()
    +              taskIdToSlaveId.put(taskId, slaveId)
    +              slaveIdsWithExecutors += slaveId
    +              coresByTaskId(taskId) = cpusToUse
    +              // Gather cpu resources from the available resources and use them in the task.
    +              val (remainingResources, cpuResourcesToUse) =
    +                partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    +              val (_, memResourcesToUse) =
    +                partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    +              val taskBuilder = MesosTaskInfo.newBuilder()
    +                .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    +                .setSlaveId(offer.getSlaveId)
    +                .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    +                .setName("Task " + taskId)
    +                .addAllResources(cpuResourcesToUse.asJava)
    +                .addAllResources(memResourcesToUse.asJava)
    +
    +              sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    +                MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image, sc.conf,
    +                  taskBuilder.getContainerBuilder())
    +              }
    +
    +              // Accept the offer and launch the task
    +              logDebug(s"Accepting offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    +              d.launchTasks(
    +                Collections.singleton(offer.getId),
    +                Collections.singleton(taskBuilder.build()), filters)
    +            } else {
    +              // Decline the offer
    +              logDebug(s"Declining offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              d.declineOffer(offer.getId)
                 }
    -
    -            // Accept the offer and launch the task
    -            logDebug(s"Accepting offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    -            d.launchTasks(
    -              Collections.singleton(offer.getId),
    -              Collections.singleton(taskBuilder.build()), filters)
               } else {
    -            // Decline the offer
    -            logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            d.declineOffer(offer.getId)
    +            // We reached the maximum number of cores for this framework. We don't need to see
    +            // new offers. Decline the offer for a long period of time.
    +            logDebug(s"Declining offer (reached max cores): $id with attributes:" +
    +              s" $offerAttributes mem: $mem cpu: $cpus" +
    +              s" for $rejectOfferDurationForReachedMaxCores seconds")
    +            d.declineOffer(offer.getId, Filters.newBuilder()
    +              .setRefuseSeconds(rejectOfferDurationForReachedMaxCores).build())
    --- End diff --
    
    I think the two cases where offers are rejected for a longer period should be consolidated in a simple helper function that logs the reason why and declines the offer. I'd also reorganize the code to be less nested:
    
    ```
    if (!meetsConstraints) {
      declineFor("unmet constraints", rejectOfferDurationUnmetConstraints)
    } else if (totalCoresAcquired >= maxCores) {
      declineFor("reached max cores", rejectOfferDurationMaxCores)
    } else {
      .. happy case
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216952194
  
    LGTM. cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62079735
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackendSuite.scala ---
    @@ -147,6 +147,19 @@ class CoarseMesosSchedulerBackendSuite extends SparkFunSuite
         verifyDeclinedOffer(driver, createOfferId("o1"), true)
       }
     
    +  test("mesos declines offers for a long time when reached spark.cores.max") {
    +    val maxCores = 3
    +    setBackend(Map("spark.cores.max" -> maxCores.toString))
    +
    +    val executorMemory = backend.executorMemory(sc)
    +    offerResources(List(
    +      (executorMemory, maxCores + 1),
    +      (executorMemory, maxCores + 1)))
    +
    +    verifyTaskLaunched("o1")
    +    verifyDeclinedOffer(driver, createOfferId("o2"), true)
    --- End diff --
    
    Ah, then can you change the test description to "mesos declines offers with a filter"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216996458
  
    **[Test build #57786 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57786/consoleFull)** for PR 10924 at commit [`112f136`](https://github.com/apache/spark/commit/112f13651306d453cac217cec4e1365da62a03a1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176342191
  
    **[Test build #50288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50288/consoleFull)** for PR 10924 at commit [`181a6ef`](https://github.com/apache/spark/commit/181a6efbd8c3c2d620107b65db163015b4f35b39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10924


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by keithchambers <gi...@git.apache.org>.
Github user keithchambers commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-192957469
  
    @andrewor14 
    
    @mgummelt works on Spark full time at Mesosphere too.  :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62094361
  
    --- Diff: docs/running-on-mesos.md ---
    @@ -406,6 +406,20 @@ See the [configuration page](configuration.html) for information on Spark config
         If unset it will point to Spark's internal web UI.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.mesos.rejectOfferDurationForUnmetConstraints</code></td>
    +  <td><code>120s</code></td>
    +  <td>
    +    Set the amount of time for which offers are rejected when constraints are unmet. See <code>spark.mesos.constraints</code>.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.mesos.rejectOfferDurationForReachedMaxCores</code></td>
    +  <td><code>120s</code></td>
    +  <td>
    +    Set the amount of time for which offers are rejected when the app already acquired <code>spark.cores.max</code> cores.
    --- End diff --
    
    Done. I added that comment to `spark.mesos.rejectOfferDurationForUnmetConstraints` as well since it's the same idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-203396355
  
    Cool, looking forward to pushing this over the finish line!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-199766699
  
    @sebastienrainville sorry for my confusion. Fine-grained mode does not respect `spark.cores.max`, so my comment does not apply. Can you just do the small refactoring and then this can go in? It's long overdue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216987277
  
    This should be ready to go now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216712096
  
    All the comments should be addressed now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r61966413
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -273,25 +277,25 @@ private[spark] class CoarseMesosSchedulerBackend(
             matchesAttributeRequirements(slaveOfferConstraints, offerAttributes)
           }
     
    -      declineUnmatchedOffers(d, unmatchedOffers)
    +      unmatchedOffers.foreach { offer =>
    --- End diff --
    
    Please keep this in a separate function `declineUnmatchedOffers`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176571496
  
    **[Test build #50334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50334/consoleFull)** for PR 10924 at commit [`bf8d870`](https://github.com/apache/spark/commit/bf8d8703dff4d43f28c011a362bfe6c15d1a1d79).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176336086
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r53344844
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -254,53 +258,65 @@ private[spark] class CoarseMesosSchedulerBackend(
             val cpus = getResource(offer.getResourcesList, "cpus").toInt
             val id = offer.getId.getValue
             if (meetsConstraints) {
    -          if (taskIdToSlaveId.size < executorLimit &&
    -              totalCoresAcquired < maxCores &&
    -              mem >= calculateTotalMemory(sc) &&
    -              cpus >= 1 &&
    -              failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    -              !slaveIdsWithExecutors.contains(slaveId)) {
    -            // Launch an executor on the slave
    -            val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    -            totalCoresAcquired += cpusToUse
    -            val taskId = newMesosTaskId()
    -            taskIdToSlaveId.put(taskId, slaveId)
    -            slaveIdsWithExecutors += slaveId
    -            coresByTaskId(taskId) = cpusToUse
    -            // Gather cpu resources from the available resources and use them in the task.
    -            val (remainingResources, cpuResourcesToUse) =
    -              partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    -            val (_, memResourcesToUse) =
    -              partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    -            val taskBuilder = MesosTaskInfo.newBuilder()
    -              .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    -              .setSlaveId(offer.getSlaveId)
    -              .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    -              .setName("Task " + taskId)
    -              .addAllResources(cpuResourcesToUse.asJava)
    -              .addAllResources(memResourcesToUse.asJava)
    -
    -            sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    -              MesosSchedulerBackendUtil
    -                .setupContainerBuilderDockerInfo(image, sc.conf, taskBuilder.getContainerBuilder())
    +          if (totalCoresAcquired < maxCores) {
    +            if (taskIdToSlaveId.size < executorLimit &&
    +                mem >= calculateTotalMemory(sc) &&
    +                cpus >= 1 &&
    +                failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    +                !slaveIdsWithExecutors.contains(slaveId)) {
    +              // Launch an executor on the slave
    +              val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    +              totalCoresAcquired += cpusToUse
    +              val taskId = newMesosTaskId()
    +              taskIdToSlaveId.put(taskId, slaveId)
    +              slaveIdsWithExecutors += slaveId
    +              coresByTaskId(taskId) = cpusToUse
    +              // Gather cpu resources from the available resources and use them in the task.
    +              val (remainingResources, cpuResourcesToUse) =
    +                partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    +              val (_, memResourcesToUse) =
    +                partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    +              val taskBuilder = MesosTaskInfo.newBuilder()
    +                .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    +                .setSlaveId(offer.getSlaveId)
    +                .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    +                .setName("Task " + taskId)
    +                .addAllResources(cpuResourcesToUse.asJava)
    +                .addAllResources(memResourcesToUse.asJava)
    +
    +              sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    +                MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image, sc.conf,
    +                  taskBuilder.getContainerBuilder())
    +              }
    +
    +              // Accept the offer and launch the task
    +              logDebug(s"Accepting offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    +              d.launchTasks(
    +                Collections.singleton(offer.getId),
    +                Collections.singleton(taskBuilder.build()), filters)
    +            } else {
    +              // Decline the offer
    +              logDebug(s"Declining offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              d.declineOffer(offer.getId)
                 }
    -
    -            // Accept the offer and launch the task
    -            logDebug(s"Accepting offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    -            d.launchTasks(
    -              Collections.singleton(offer.getId),
    -              Collections.singleton(taskBuilder.build()), filters)
               } else {
    -            // Decline the offer
    -            logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            d.declineOffer(offer.getId)
    +            // We reached the maximum number of cores for this framework. We don't need to see
    +            // new offers. Decline the offer for a long period of time.
    +            logDebug(s"Declining offer (reached max cores): $id with attributes:" +
    +              s" $offerAttributes mem: $mem cpu: $cpus" +
    +              s" for $rejectOfferDurationForReachedMaxCores seconds")
    +            d.declineOffer(offer.getId, Filters.newBuilder()
    +              .setRefuseSeconds(rejectOfferDurationForReachedMaxCores).build())
    --- End diff --
    
    To me it seems the only thing it depends on is the `offerId`, so it could go in `MesosSchedulerUtils`.
    
    But if that's overkill, let's do it only for this one, and get rid of the nested if structure. It also means there's no need to use `Option` for the reason and duration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216645080
  
    @dragos I finally did the change. Sorry for the delay


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216726935
  
    **[Test build #57699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57699/consoleFull)** for PR 10924 at commit [`ad2f014`](https://github.com/apache/spark/commit/ad2f014dadda0f35098e82a795a1fa1319032acd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62078667
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackendSuite.scala ---
    @@ -147,6 +147,19 @@ class CoarseMesosSchedulerBackendSuite extends SparkFunSuite
         verifyDeclinedOffer(driver, createOfferId("o1"), true)
       }
     
    +  test("mesos declines offers for a long time when reached spark.cores.max") {
    +    val maxCores = 3
    +    setBackend(Map("spark.cores.max" -> maxCores.toString))
    +
    +    val executorMemory = backend.executorMemory(sc)
    +    offerResources(List(
    +      (executorMemory, maxCores + 1),
    +      (executorMemory, maxCores + 1)))
    +
    +    verifyTaskLaunched("o1")
    +    verifyDeclinedOffer(driver, createOfferId("o2"), true)
    --- End diff --
    
    It would have failed because the declined offer wouldn't have been passed a filter. It would have passed with `verifyDeclinedOffer(driver, createOfferId("o2"), false)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216987221
  
    **[Test build #57794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57794/consoleFull)** for PR 10924 at commit [`5b55ae0`](https://github.com/apache/spark/commit/5b55ae01085913743a95fcac8223d7917db0a617).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216674406
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57661/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216976044
  
    **[Test build #57774 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57774/consoleFull)** for PR 10924 at commit [`0ccd71c`](https://github.com/apache/spark/commit/0ccd71c268fb9e13e306bf6148200303d5d1f4c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176343174
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216944812
  
    @mgummelt I fixed the documentation and test description. I'm not sure how the rendered version of the doc will look like; I'm a little worried about the length of the variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216996710
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57786/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216683008
  
    please add a test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216727018
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57699/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-215111311
  
    ping @sebastienrainville 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216674105
  
    **[Test build #57661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57661/consoleFull)** for PR 10924 at commit [`9b314e0`](https://github.com/apache/spark/commit/9b314e025df193cf416c50b0c32535e191e97ecd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r61966910
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -326,7 +330,9 @@ private[spark] class CoarseMesosSchedulerBackend(
             d.launchTasks(
               Collections.singleton(offer.getId),
               offerTasks.asJava)
    -      } else { // decline
    +      } else if (totalCoresAcquired >= maxCores) {
    +        declineOffer(d, offer, "reached max cores", rejectOfferDurationForReachedMaxCores)
    +      } else {
             logDebug(s"Declining offer: $id with attributes: $offerAttributes " +
               s"mem: $offerMem cpu: $offerCpus")
     
    --- End diff --
    
    can we make the line below also use the new `declineOffer` method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62077827
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackendSuite.scala ---
    @@ -147,6 +147,19 @@ class CoarseMesosSchedulerBackendSuite extends SparkFunSuite
         verifyDeclinedOffer(driver, createOfferId("o1"), true)
       }
     
    +  test("mesos declines offers for a long time when reached spark.cores.max") {
    +    val maxCores = 3
    +    setBackend(Map("spark.cores.max" -> maxCores.toString))
    +
    +    val executorMemory = backend.executorMemory(sc)
    +    offerResources(List(
    +      (executorMemory, maxCores + 1),
    +      (executorMemory, maxCores + 1)))
    +
    +    verifyTaskLaunched("o1")
    +    verifyDeclinedOffer(driver, createOfferId("o2"), true)
    --- End diff --
    
    This doesn't test the new config var.  This would have passed before the addition of this feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216942935
  
    **[Test build #57774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57774/consoleFull)** for PR 10924 at commit [`0ccd71c`](https://github.com/apache/spark/commit/0ccd71c268fb9e13e306bf6148200303d5d1f4c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216985035
  
    Looks good. Just minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62104725
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -279,18 +283,28 @@ private[spark] class CoarseMesosSchedulerBackend(
       }
     
       private def declineUnmatchedOffers(d: SchedulerDriver, offers: Buffer[Offer]): Unit = {
    -    for (offer <- offers) {
    -      val id = offer.getId.getValue
    -      val offerAttributes = toAttributeMap(offer.getAttributesList)
    -      val mem = getResource(offer.getResourcesList, "mem")
    -      val cpus = getResource(offer.getResourcesList, "cpus")
    -      val filters = Filters.newBuilder()
    -        .setRefuseSeconds(rejectOfferDurationForUnmetConstraints).build()
    +    offers.foreach { offer =>
    +      declineOffer(d, offer, Some("unmet constraints"),
    +        Some(rejectOfferDurationForUnmetConstraints))
    +    }
    +  }
    +
    +  private def declineOffer(d: SchedulerDriver, offer: Offer, reason: Option[String] = None,
    +      refuseSeconds: Option[Long] = None): Unit = {
     
    -      logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus"
    -        + s" for $rejectOfferDurationForUnmetConstraints seconds")
    +    val id = offer.getId.getValue
    +    val offerAttributes = toAttributeMap(offer.getAttributesList)
    +    val mem = getResource(offer.getResourcesList, "mem")
    +    val cpus = getResource(offer.getResourcesList, "cpus")
     
    -      d.declineOffer(offer.getId, filters)
    +    logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem"
    +      + s" cpu: $cpus for $refuseSeconds seconds" + reason.fold("")(r => s" (reason: $r)"))
    --- End diff --
    
    just use `map` `getOrElse` here. It's easier to understand than fold.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62104397
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -279,18 +283,28 @@ private[spark] class CoarseMesosSchedulerBackend(
       }
     
       private def declineUnmatchedOffers(d: SchedulerDriver, offers: Buffer[Offer]): Unit = {
    -    for (offer <- offers) {
    -      val id = offer.getId.getValue
    -      val offerAttributes = toAttributeMap(offer.getAttributesList)
    -      val mem = getResource(offer.getResourcesList, "mem")
    -      val cpus = getResource(offer.getResourcesList, "cpus")
    -      val filters = Filters.newBuilder()
    -        .setRefuseSeconds(rejectOfferDurationForUnmetConstraints).build()
    +    offers.foreach { offer =>
    +      declineOffer(d, offer, Some("unmet constraints"),
    +        Some(rejectOfferDurationForUnmetConstraints))
    +    }
    +  }
    +
    +  private def declineOffer(d: SchedulerDriver, offer: Offer, reason: Option[String] = None,
    +      refuseSeconds: Option[Long] = None): Unit = {
    --- End diff --
    
    style:
    ```
    private def declineOffer(
        d: SchedulerDriver,
        ...,
        refuseSeconds: Option[Long] = None): Unit = {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216976424
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62086746
  
    --- Diff: docs/running-on-mesos.md ---
    @@ -406,6 +406,20 @@ See the [configuration page](configuration.html) for information on Spark config
         If unset it will point to Spark's internal web UI.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.mesos.rejectOfferDurationForUnmetConstraints</code></td>
    +  <td><code>120s</code></td>
    +  <td>
    +    Set the amount of time for which offers are rejected when constraints are unmet. See <code>spark.mesos.constraints</code>.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.mesos.rejectOfferDurationForReachedMaxCores</code></td>
    +  <td><code>120s</code></td>
    +  <td>
    +    Set the amount of time for which offers are rejected when the app already acquired <code>spark.cores.max</code> cores.
    --- End diff --
    
    Can you add "This is used to prevent starvation of other frameworks."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r53349625
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -254,53 +258,65 @@ private[spark] class CoarseMesosSchedulerBackend(
             val cpus = getResource(offer.getResourcesList, "cpus").toInt
             val id = offer.getId.getValue
             if (meetsConstraints) {
    -          if (taskIdToSlaveId.size < executorLimit &&
    -              totalCoresAcquired < maxCores &&
    -              mem >= calculateTotalMemory(sc) &&
    -              cpus >= 1 &&
    -              failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    -              !slaveIdsWithExecutors.contains(slaveId)) {
    -            // Launch an executor on the slave
    -            val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    -            totalCoresAcquired += cpusToUse
    -            val taskId = newMesosTaskId()
    -            taskIdToSlaveId.put(taskId, slaveId)
    -            slaveIdsWithExecutors += slaveId
    -            coresByTaskId(taskId) = cpusToUse
    -            // Gather cpu resources from the available resources and use them in the task.
    -            val (remainingResources, cpuResourcesToUse) =
    -              partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    -            val (_, memResourcesToUse) =
    -              partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    -            val taskBuilder = MesosTaskInfo.newBuilder()
    -              .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    -              .setSlaveId(offer.getSlaveId)
    -              .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    -              .setName("Task " + taskId)
    -              .addAllResources(cpuResourcesToUse.asJava)
    -              .addAllResources(memResourcesToUse.asJava)
    -
    -            sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    -              MesosSchedulerBackendUtil
    -                .setupContainerBuilderDockerInfo(image, sc.conf, taskBuilder.getContainerBuilder())
    +          if (totalCoresAcquired < maxCores) {
    +            if (taskIdToSlaveId.size < executorLimit &&
    +                mem >= calculateTotalMemory(sc) &&
    +                cpus >= 1 &&
    +                failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    +                !slaveIdsWithExecutors.contains(slaveId)) {
    +              // Launch an executor on the slave
    +              val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    +              totalCoresAcquired += cpusToUse
    +              val taskId = newMesosTaskId()
    +              taskIdToSlaveId.put(taskId, slaveId)
    +              slaveIdsWithExecutors += slaveId
    +              coresByTaskId(taskId) = cpusToUse
    +              // Gather cpu resources from the available resources and use them in the task.
    +              val (remainingResources, cpuResourcesToUse) =
    +                partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    +              val (_, memResourcesToUse) =
    +                partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    +              val taskBuilder = MesosTaskInfo.newBuilder()
    +                .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    +                .setSlaveId(offer.getSlaveId)
    +                .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    +                .setName("Task " + taskId)
    +                .addAllResources(cpuResourcesToUse.asJava)
    +                .addAllResources(memResourcesToUse.asJava)
    +
    +              sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    +                MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image, sc.conf,
    +                  taskBuilder.getContainerBuilder())
    +              }
    +
    +              // Accept the offer and launch the task
    +              logDebug(s"Accepting offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    +              d.launchTasks(
    +                Collections.singleton(offer.getId),
    +                Collections.singleton(taskBuilder.build()), filters)
    +            } else {
    +              // Decline the offer
    +              logDebug(s"Declining offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              d.declineOffer(offer.getId)
                 }
    -
    -            // Accept the offer and launch the task
    -            logDebug(s"Accepting offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    -            d.launchTasks(
    -              Collections.singleton(offer.getId),
    -              Collections.singleton(taskBuilder.build()), filters)
               } else {
    -            // Decline the offer
    -            logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            d.declineOffer(offer.getId)
    +            // We reached the maximum number of cores for this framework. We don't need to see
    +            // new offers. Decline the offer for a long period of time.
    +            logDebug(s"Declining offer (reached max cores): $id with attributes:" +
    +              s" $offerAttributes mem: $mem cpu: $cpus" +
    +              s" for $rejectOfferDurationForReachedMaxCores seconds")
    +            d.declineOffer(offer.getId, Filters.newBuilder()
    +              .setRefuseSeconds(rejectOfferDurationForReachedMaxCores).build())
    --- End diff --
    
    It also depends on `id`, `offerAttributes`, `mem` and `cpus` for logging. They're all derived from `offer` and some are easier than others to get but we shouldn't compute them twice just for logging.
    
    Okay I'll change it to handle only the case where offers are rejected for a longer period.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-185010055
  
    I'm not sure we want to use only one rejection delay setting in these 2 cases. Arguably we could reject offers for a much longer period of time for `unmet constraints` since AFAIK constraints don't change dynamically and therefore are true for the lifetime of a framework. It's a bit different with `reached max cores` because if we lose an executor we want the scheduler to launch a new one and ideally not have to wait for too long for it. I put the same default delay of 120s for both since it seems to be a reasonable value.
    
    And for the fine-grained mode, there's no reason to not add the same logic. I'll do the change and test it. Unfortunately, the example function `declineOffer` cannot be reused there because it relies on local variables declared inside the loop. It really feels like this code needs some refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r61966798
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -273,25 +277,25 @@ private[spark] class CoarseMesosSchedulerBackend(
             matchesAttributeRequirements(slaveOfferConstraints, offerAttributes)
           }
     
    -      declineUnmatchedOffers(d, unmatchedOffers)
    +      unmatchedOffers.foreach { offer =>
    +        declineOffer(d, offer, "unmet constraints", rejectOfferDurationForUnmetConstraints)
    +      }
    +
           handleMatchedOffers(d, matchedOffers)
         }
       }
     
    -  private def declineUnmatchedOffers(d: SchedulerDriver, offers: Buffer[Offer]): Unit = {
    -    for (offer <- offers) {
    -      val id = offer.getId.getValue
    -      val offerAttributes = toAttributeMap(offer.getAttributesList)
    -      val mem = getResource(offer.getResourcesList, "mem")
    -      val cpus = getResource(offer.getResourcesList, "cpus")
    -      val filters = Filters.newBuilder()
    -        .setRefuseSeconds(rejectOfferDurationForUnmetConstraints).build()
    +  private def declineOffer(d: SchedulerDriver, offer: Offer, reason: String, refuseSeconds: Long) {
    +    val id = offer.getId.getValue
    +    val offerAttributes = toAttributeMap(offer.getAttributesList)
    +    val mem = getResource(offer.getResourcesList, "mem")
    +    val cpus = getResource(offer.getResourcesList, "cpus")
     
    -      logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus"
    -        + s" for $rejectOfferDurationForUnmetConstraints seconds")
    +    logDebug(s"Declining offer ($reason): $id with attributes: $offerAttributes mem: $mem"
    +      + s" cpu: $cpus for $rejectOfferDurationForUnmetConstraints seconds")
    --- End diff --
    
    s/rejectOfferDurationForUnmetConstraints/refuseSeconds


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-175062041
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176343179
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50288/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r61966680
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -273,25 +277,25 @@ private[spark] class CoarseMesosSchedulerBackend(
             matchesAttributeRequirements(slaveOfferConstraints, offerAttributes)
           }
     
    -      declineUnmatchedOffers(d, unmatchedOffers)
    +      unmatchedOffers.foreach { offer =>
    +        declineOffer(d, offer, "unmet constraints", rejectOfferDurationForUnmetConstraints)
    +      }
    +
           handleMatchedOffers(d, matchedOffers)
         }
       }
     
    -  private def declineUnmatchedOffers(d: SchedulerDriver, offers: Buffer[Offer]): Unit = {
    -    for (offer <- offers) {
    -      val id = offer.getId.getValue
    -      val offerAttributes = toAttributeMap(offer.getAttributesList)
    -      val mem = getResource(offer.getResourcesList, "mem")
    -      val cpus = getResource(offer.getResourcesList, "cpus")
    -      val filters = Filters.newBuilder()
    -        .setRefuseSeconds(rejectOfferDurationForUnmetConstraints).build()
    +  private def declineOffer(d: SchedulerDriver, offer: Offer, reason: String, refuseSeconds: Long) {
    +    val id = offer.getId.getValue
    +    val offerAttributes = toAttributeMap(offer.getAttributesList)
    +    val mem = getResource(offer.getResourcesList, "mem")
    +    val cpus = getResource(offer.getResourcesList, "cpus")
     
    -      logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus"
    -        + s" for $rejectOfferDurationForUnmetConstraints seconds")
    +    logDebug(s"Declining offer ($reason): $id with attributes: $offerAttributes mem: $mem"
    +      + s" cpu: $cpus for $rejectOfferDurationForUnmetConstraints seconds")
     
    -      d.declineOffer(offer.getId, filters)
    -    }
    +    val filters = Filters.newBuilder().setRefuseSeconds(refuseSeconds).build()
    --- End diff --
    
    place this above the log message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216688931
  
    @mgummelt the problem we were seeing is when running many spark apps (~20) and most of them small (long lived small streaming apps), then the bigger apps just get allocated a small number of cores even though the cluster still has a lot of available cores. In that scenario the big apps are not actually receiving offers from Mesos anymore, and that's because the small apps have a much smaller "max share" so they get the offers first. With a low number of apps it's okay because with the default `refuse_seconds` value of 5 seconds it's enough time for Mesos to cycle through every app and send offers to each of them. But as the number of apps increases it becomes more and more problematic, to the point where Mesos stop sending offers to the apps ranked the lowest by DRF, i.e. the big apps.
    
    The solution implemented in this PR is to refuse the offers for a long period of time when we know that we don't need offers anymore because the app already acquired `spark.cores.max`. The only case where we would need to acquire more cores is if we lost an executor, so a value of `120s` for `refuse_seconds` seems like a good tradeoff.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176544434
  
    **[Test build #50334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50334/consoleFull)** for PR 10924 at commit [`bf8d870`](https://github.com/apache/spark/commit/bf8d8703dff4d43f28c011a362bfe6c15d1a1d79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-217010132
  
    Merging into master 2.0, thanks for bringing this back to life.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r53118429
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -254,53 +258,65 @@ private[spark] class CoarseMesosSchedulerBackend(
             val cpus = getResource(offer.getResourcesList, "cpus").toInt
             val id = offer.getId.getValue
             if (meetsConstraints) {
    -          if (taskIdToSlaveId.size < executorLimit &&
    -              totalCoresAcquired < maxCores &&
    -              mem >= calculateTotalMemory(sc) &&
    -              cpus >= 1 &&
    -              failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    -              !slaveIdsWithExecutors.contains(slaveId)) {
    -            // Launch an executor on the slave
    -            val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    -            totalCoresAcquired += cpusToUse
    -            val taskId = newMesosTaskId()
    -            taskIdToSlaveId.put(taskId, slaveId)
    -            slaveIdsWithExecutors += slaveId
    -            coresByTaskId(taskId) = cpusToUse
    -            // Gather cpu resources from the available resources and use them in the task.
    -            val (remainingResources, cpuResourcesToUse) =
    -              partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    -            val (_, memResourcesToUse) =
    -              partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    -            val taskBuilder = MesosTaskInfo.newBuilder()
    -              .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    -              .setSlaveId(offer.getSlaveId)
    -              .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    -              .setName("Task " + taskId)
    -              .addAllResources(cpuResourcesToUse.asJava)
    -              .addAllResources(memResourcesToUse.asJava)
    -
    -            sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    -              MesosSchedulerBackendUtil
    -                .setupContainerBuilderDockerInfo(image, sc.conf, taskBuilder.getContainerBuilder())
    +          if (totalCoresAcquired < maxCores) {
    +            if (taskIdToSlaveId.size < executorLimit &&
    +                mem >= calculateTotalMemory(sc) &&
    +                cpus >= 1 &&
    +                failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
    +                !slaveIdsWithExecutors.contains(slaveId)) {
    +              // Launch an executor on the slave
    +              val cpusToUse = math.min(cpus, maxCores - totalCoresAcquired)
    +              totalCoresAcquired += cpusToUse
    +              val taskId = newMesosTaskId()
    +              taskIdToSlaveId.put(taskId, slaveId)
    +              slaveIdsWithExecutors += slaveId
    +              coresByTaskId(taskId) = cpusToUse
    +              // Gather cpu resources from the available resources and use them in the task.
    +              val (remainingResources, cpuResourcesToUse) =
    +                partitionResources(offer.getResourcesList, "cpus", cpusToUse)
    +              val (_, memResourcesToUse) =
    +                partitionResources(remainingResources.asJava, "mem", calculateTotalMemory(sc))
    +              val taskBuilder = MesosTaskInfo.newBuilder()
    +                .setTaskId(TaskID.newBuilder().setValue(taskId.toString).build())
    +                .setSlaveId(offer.getSlaveId)
    +                .setCommand(createCommand(offer, cpusToUse + extraCoresPerSlave, taskId))
    +                .setName("Task " + taskId)
    +                .addAllResources(cpuResourcesToUse.asJava)
    +                .addAllResources(memResourcesToUse.asJava)
    +
    +              sc.conf.getOption("spark.mesos.executor.docker.image").foreach { image =>
    +                MesosSchedulerBackendUtil.setupContainerBuilderDockerInfo(image, sc.conf,
    +                  taskBuilder.getContainerBuilder())
    +              }
    +
    +              // Accept the offer and launch the task
    +              logDebug(s"Accepting offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    +              d.launchTasks(
    +                Collections.singleton(offer.getId),
    +                Collections.singleton(taskBuilder.build()), filters)
    +            } else {
    +              // Decline the offer
    +              logDebug(s"Declining offer: $id with attributes: $offerAttributes" +
    +                s" mem: $mem cpu: $cpus")
    +              d.declineOffer(offer.getId)
                 }
    -
    -            // Accept the offer and launch the task
    -            logDebug(s"Accepting offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            slaveIdToHost(offer.getSlaveId.getValue) = offer.getHostname
    -            d.launchTasks(
    -              Collections.singleton(offer.getId),
    -              Collections.singleton(taskBuilder.build()), filters)
               } else {
    -            // Decline the offer
    -            logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus")
    -            d.declineOffer(offer.getId)
    +            // We reached the maximum number of cores for this framework. We don't need to see
    +            // new offers. Decline the offer for a long period of time.
    +            logDebug(s"Declining offer (reached max cores): $id with attributes:" +
    +              s" $offerAttributes mem: $mem cpu: $cpus" +
    +              s" for $rejectOfferDurationForReachedMaxCores seconds")
    +            d.declineOffer(offer.getId, Filters.newBuilder()
    +              .setRefuseSeconds(rejectOfferDurationForReachedMaxCores).build())
    --- End diff --
    
    My first implementation was actually removing all the code duplication in the decline offer path, but it seemed overkill:
    ```scala
            def declineOffer(reason: Option[String] = None, refuseSeconds: Option[Long] = None) {
              logDebug("Declining offer" +
                reason.fold("") { r => s" ($r)"} +
                s": $id with attributes: $offerAttributes mem: $mem cpu: $cpus" +
                refuseSeconds.fold("") { r => s" for $r seconds" })
    
              refuseSeconds match {
                case Some(seconds) => {
                  val filter = Filters.newBuilder().setRefuseSeconds(seconds).build()
                  d.declineOffer(offer.getId, filter)
                }
                case _ => d.declineOffer(offer.getId)
              }
            }
    ```
    
    Also this cannot be reused easily in the fine-grained mode since it relies on attributes computed locally in the loop. I opted for simplicity thinking that this whole function would be refactored at some point. I'm happy to use the implementation above for `refuseOffer` if you think that it's better. It can be simplified quite a bit if it's only for the 2 cases where offers are rejected for a longer period but then we still have similar code for the default reject.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-217017952
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r61966616
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -273,25 +277,25 @@ private[spark] class CoarseMesosSchedulerBackend(
             matchesAttributeRequirements(slaveOfferConstraints, offerAttributes)
           }
     
    -      declineUnmatchedOffers(d, unmatchedOffers)
    +      unmatchedOffers.foreach { offer =>
    +        declineOffer(d, offer, "unmet constraints", rejectOfferDurationForUnmetConstraints)
    +      }
    +
           handleMatchedOffers(d, matchedOffers)
         }
       }
     
    -  private def declineUnmatchedOffers(d: SchedulerDriver, offers: Buffer[Offer]): Unit = {
    -    for (offer <- offers) {
    -      val id = offer.getId.getValue
    -      val offerAttributes = toAttributeMap(offer.getAttributesList)
    -      val mem = getResource(offer.getResourcesList, "mem")
    -      val cpus = getResource(offer.getResourcesList, "cpus")
    -      val filters = Filters.newBuilder()
    -        .setRefuseSeconds(rejectOfferDurationForUnmetConstraints).build()
    +  private def declineOffer(d: SchedulerDriver, offer: Offer, reason: String, refuseSeconds: Long) {
    +    val id = offer.getId.getValue
    +    val offerAttributes = toAttributeMap(offer.getAttributesList)
    +    val mem = getResource(offer.getResourcesList, "mem")
    +    val cpus = getResource(offer.getResourcesList, "cpus")
     
    -      logDebug(s"Declining offer: $id with attributes: $offerAttributes mem: $mem cpu: $cpus"
    -        + s" for $rejectOfferDurationForUnmetConstraints seconds")
    +    logDebug(s"Declining offer ($reason): $id with attributes: $offerAttributes mem: $mem"
    --- End diff --
    
    Place the "Reason: " field somewhere more readable.  Right now, it breaks up the log message at a confusing location.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216964273
  
    **[Test build #57786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57786/consoleFull)** for PR 10924 at commit [`112f136`](https://github.com/apache/spark/commit/112f13651306d453cac217cec4e1365da62a03a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62077076
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala ---
    @@ -326,11 +340,25 @@ private[spark] class CoarseMesosSchedulerBackend(
             d.launchTasks(
               Collections.singleton(offer.getId),
               offerTasks.asJava)
    -      } else { // decline
    -        logDebug(s"Declining offer: $id with attributes: $offerAttributes " +
    -          s"mem: $offerMem cpu: $offerCpus")
    -
    -        d.declineOffer(offer.getId)
    +      } else if (totalCoresAcquired >= maxCores) {
    +        // We already acquired the maximum number of cores so we don't need to get new offers
    +        // unless an executor goes down. Setting a high "refuse seconds" filter is especially
    +        // important when running a lot of frameworks in the same Mesos cluster to avoid resource
    +        // starvation. One such case of starvation happens when running many small Spark apps
    +        // (e.g. small Spark streaming jobs) then a new big Spark app would get offered only a
    +        // fraction of the cores available in the cluster and Mesos would then stop sending it
    +        // offers. That's because the small apps have a much smaller "max share" so they get the
    +        // offers first. With a low number of apps it's okay because with the default
    +        // refuse_seconds value of 5 seconds it's enough time for Mesos to cycle through every
    +        // app and send offers to each of them. But as the number of apps increases it becomes
    +        // more and more problematic, to the point where Mesos stops sending offers to the apps
    +        // ranked the lowest by DRF, i.e. the big apps. We mitigate this problem by declining
    +        // the offers for a long period of time when we know that we don't need offers anymore
    +        // because the app already acquired all the cores it needs.
    --- End diff --
    
    This is a bit verbose.  I think something like "Reject an offer for a configurable amount of time to avoid starving other frameworks" is sufficient.
    
    Also, thanks for the code docs, but I was thinking we should add this config var to the user docs as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216712031
  
    **[Test build #57699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57699/consoleFull)** for PR 10924 at commit [`ad2f014`](https://github.com/apache/spark/commit/ad2f014dadda0f35098e82a795a1fa1319032acd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216996707
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-217017954
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57794/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216645437
  
    **[Test build #57661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57661/consoleFull)** for PR 10924 at commit [`9b314e0`](https://github.com/apache/spark/commit/9b314e025df193cf416c50b0c32535e191e97ecd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216698159
  
    Sounds good.  Also seems like something we should document, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216727015
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-187770729
  
    You are right about having two different settings Makes sense. Let's go with that for the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216976431
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57774/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176571611
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50334/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-176571609
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by mgummelt <gi...@git.apache.org>.
Github user mgummelt commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216681961
  
    What starvation behavior were you seeing?  With the DRF allocator, Mesos should offer rejected resources to other frameworks before re-offering to the Spark job.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216994289
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by sebastienrainville <gi...@git.apache.org>.
Github user sebastienrainville commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-199911278
  
    @dragos sorry for the delay. I had also forgotten that `spark.cores.max` wasn't respected in fine-grained mode. Quite a few things changed in this class since the last time I looked at it. I will rebase on master and do the appropriate changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10924#discussion_r62104291
  
    --- Diff: docs/running-on-mesos.md ---
    @@ -406,6 +406,22 @@ See the [configuration page](configuration.html) for information on Spark config
         If unset it will point to Spark's internal web UI.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.mesos.rejectOfferDurationForUnmetConstraints</code></td>
    +  <td><code>120s</code></td>
    +  <td>
    +    Set the amount of time for which offers are rejected when constraints are unmet. See <code>spark.mesos.constraints</code>.
    +    This is used to prevent starvation of other frameworks.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>spark.mesos.rejectOfferDurationForReachedMaxCores</code></td>
    --- End diff --
    
    I would actually not document these configs. Doing so would require us to maintain backward compatibility. I can't think of any strong use case where someone would want to change these values so I don't think it's worth the maintenance burden.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...

Posted by dragos <gi...@git.apache.org>.
Github user dragos commented on the pull request:

    https://github.com/apache/spark/pull/10924#issuecomment-216759444
  
    
    > On 4 mai 2016, at 02:22, Sebastien Rainville <no...@github.com> wrote:
    > 
    > @dragos I finally did the change. Sorry for the delay
    > 
    Excellent, thanks! I won't be able  to review but it looks like Michael took over. 
    
    
    > — 
    > You are receiving this because you were mentioned.
    > Reply to this email directly or view it on GitHub
    > 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org