You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jcuquemelle <gi...@git.apache.org> on 2017/12/04 18:19:29 UTC

[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

GitHub user jcuquemelle opened a pull request:

    https://github.com/apache/spark/pull/19881

    [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

    ## What changes were proposed in this pull request?
    
    let's say an executor has spark.executor.cores / spark.task.cpus taskSlots
    The current dynamic allocation policy allocates enough executors
    to have each taskSlot execute a single task, which wastes resources when
    tasks are small regarding executor allocation overhead. By adding the
    tasksPerExecutorSlot, it is made possible to specify how many tasks
    a single slot should ideally execute to mitigate the overhead of executor
    allocation.
    
    ## How was this patch tested?
    Units tests and runs on various actual workloads on a Yarn Cluster 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jcuquemelle/spark AddTaskPerExecutorSlot

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19881
    
----
commit 895882feebc53f44a70278e0b475b2fb937d331a
Author: Julien Cuquemelle <j....@criteo.com>
Date:   2017-11-30T16:28:06Z

    [SPARK-22683][CORE] Allow tuning the number of dynamically allocated executors
    
    let's say an executor has spark.executor.cores / spark.task.cpus taskSlots
    
    The current dynamic allocation policy allocates enough executors
    to have each taskSlot execute a single task, which wastes resources when
     tasks are small regarding executor allocation overhead. By adding the
    tasksPerExecutorSlot, it is made possible to specify how many tasks
    a single slot should ideally execute to mitigate the overhead of executor
    allocation.

commit fce3b976d0b22c4d01ef4fdd5339835bc6d6fcb1
Author: Julien Cuquemelle <j....@criteo.com>
Date:   2017-11-30T16:28:06Z

    [SPARK-22683][DOC] document tasksPerExecutorSlot parameter

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a executorAllocationRatio...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r183337314
  
    --- Diff: docs/configuration.md ---
    @@ -1751,6 +1751,7 @@ Apart from these, the following properties are also available, and may be useful
         <code>spark.dynamicAllocation.minExecutors</code>,
         <code>spark.dynamicAllocation.maxExecutors</code>, and
         <code>spark.dynamicAllocation.initialExecutors</code>
    +    <code>spark.dynamicAllocation.fullExecutorAllocationDivisor</code>
    --- End diff --
    
    Done, missed that one, sorry :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    No we don't strictly need it in the name, the reasoning behind it was to indicate that this was a divisor based on if you have fully allocated executors for all the tasks and were running full parallelism. 
    Are you suggesting just use spark.dynamicAllocation.executorAllocationDivisor?  other ones thrown are were like maxExecutorAllocationDivisor.  One thing we were trying to keep from doing is confusing it with the maxExecutors config as well.  Opinions?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r174126562
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    +  <td>1</td>
    +  <td>
    +    By default, the dynamic allocation will request enough executors to maximize the 
    +    parallelism according to the number of tasks to process. While this minimizes the 
    +    latency of the job, with small tasks this setting wastes a lot of resources due to
    +    executor allocation overhead, as some executor might not even do any work.
    +    This setting allows to set a divisor that will be used to reduce the number of
    +    executors w.r.t. full parallelism
    +    Defaults to 1.0
    --- End diff --
    
    I think we should define that maxExecutors trumps this setting.  
    
    If I have 10000 tasks, divisor 2, I would expect 5000 executors, but if max executors is 1000, that is all I get. 
    
    we should add a test for this interaction as well


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r176351053
  
    --- Diff: core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala ---
    @@ -145,6 +145,39 @@ class ExecutorAllocationManagerSuite
         assert(numExecutorsToAdd(manager) === 1)
       }
     
    +  def testParallelismDivisor(cores: Int, divisor: Double, expected: Int): Unit = {
    +    val conf = new SparkConf()
    +      .setMaster("myDummyLocalExternalClusterManager")
    +      .setAppName("test-executor-allocation-manager")
    +      .set("spark.dynamicAllocation.enabled", "true")
    +      .set("spark.dynamicAllocation.testing", "true")
    +      .set("spark.dynamicAllocation.maxExecutors", "15")
    +      .set("spark.dynamicAllocation.minExecutors", "3")
    +      .set("spark.dynamicAllocation.fullExecutorAllocationDivisor", divisor.toString)
    +      .set("spark.executor.cores", cores.toString)
    +    val sc = new SparkContext(conf)
    +    contexts += sc
    +    var manager = sc.executorAllocationManager.get
    +    post(sc.listenerBus, SparkListenerStageSubmitted(createStageInfo(0, 20)))
    +    for (i <- 0 to 5) {
    +      addExecutors(manager)
    --- End diff --
    
    If we want to check the capping by max / min executors, we need to actually try and add executors. The max /min capping does not occur during the computation of the target number of exes, but at the time they are added


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Can you wait another day? I just find the name pretty weird. Do we have
    other configs that use the “divisor” suffix?
    
    On Wed, Mar 28, 2018 at 7:23 AM Tom Graves <no...@github.com> wrote:
    
    > I'll leave this a bit longer but then I'm going to merge it later today
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19881#issuecomment-376905017>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AATvPOFekjRxMQwLNeHMCtxZt92Fv3YGks5ti5z8gaJpZM4Q1Frd>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r166655479
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -114,9 +114,13 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val taskSlotPerExecutor =
         conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)
     
    +  private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)
    +
    +  private val tasksPerExecutor = tasksPerExecutorSlot * taskSlotPerExecutor
    --- End diff --
    
    Since we aren't using concept of slots, I think we should leave the tasksPerExecutor alone and put this functionality into maxNumExecutorsNeeded()


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89467/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r179760871
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
         conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)
     
    +  private val fullExecutorAllocationDivisor =
    +    conf.getDouble("spark.dynamicAllocation.fullExecutorAllocationDivisor", 1.0)
    --- End diff --
    
    forgot about this earlier but this should really be a config similar to  DYN_ALLOCATION_MIN_EXECUTORS


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    SGTM on divisor.
    
    Do we need "full" there in the config?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r175852840
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    +  <td>1</td>
    +  <td>
    +    By default, the dynamic allocation will request enough executors to maximize the 
    +    parallelism according to the number of tasks to process. While this minimizes the 
    +    latency of the job, with small tasks this setting wastes a lot of resources due to
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Please see JIRA. I don't think this is worth doing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    more aggressive must be forbidden, because the setting of 1.0 gives enough executors so that if the executor provisioning was perfect (e.g. all executors were available at the same time) and the mapping of tasks to executors was optimal, each executor core (or taskSlot as in the original naming) would process exactly one task. If you ask for more executors, you're sure they will be wasted.
    Which is why it felt natural to have a divisor semantics, because it implies the parameter can only be used to reduce parallelism.
    How about fullParallelismThrottlingRate ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Maybe instead of "divisor", we just have a "rate" or "factor" that can be floating point value, and use multiplication rather than division? This way people can also make it even more aggressive.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    just minor comment about the test otherwise looks good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a executorAllocationRatio parame...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #89712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89712/testReport)** for PR 19881 at commit [`3b1dddc`](https://github.com/apache/spark/commit/3b1dddcfa08ae63af7e43a1181f2f1e709447a07).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a executorAllocationRatio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19881


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r175852906
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    +  <td>1</td>
    +  <td>
    +    By default, the dynamic allocation will request enough executors to maximize the 
    +    parallelism according to the number of tasks to process. While this minimizes the 
    +    latency of the job, with small tasks this setting wastes a lot of resources due to
    +    executor allocation overhead, as some executor might not even do any work.
    +    This setting allows to set a divisor that will be used to reduce the number of
    +    executors w.r.t. full parallelism
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    jenkins, ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    ping @jcuquemelle  can you update this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #88180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88180/testReport)** for PR 19881 at commit [`56c3f43`](https://github.com/apache/spark/commit/56c3f43bff4f0993b134497f0792d4493ea22afc).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    @rxin I assume you are just trying to not use divisor since its not used anywhere else?  As @jcuquemelle state I don't see a use case for this to be made more aggressive if you have one please let us know, but otherwise it just wastes resources. 
    
    Personally I still like divisor because that is what you are doing.  I don't think because its not in any other configs is a good reason to not use it.   Looking at I don't see any public configs that have factor in the name of them either.  I am not fond of rate because its not a rate (ie how quickly/slowly you are allocating), its a limit on max number of executors.  
    
    I also think its more natural for people to think of this as a divisor vs a multiplier.  if I want 1/2 of the executors you divide by 2.   I think we should name it based on what is most likely understood by the end user.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #88472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88472/testReport)** for PR 19881 at commit [`a40d160`](https://github.com/apache/spark/commit/a40d16031a4618db4023a3cf432b4d8f37f70b7d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r176229298
  
    --- Diff: core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala ---
    @@ -145,6 +145,39 @@ class ExecutorAllocationManagerSuite
         assert(numExecutorsToAdd(manager) === 1)
       }
     
    +  def testParallelismDivisor(cores: Int, divisor: Double, expected: Int): Unit = {
    +    val conf = new SparkConf()
    +      .setMaster("myDummyLocalExternalClusterManager")
    +      .setAppName("test-executor-allocation-manager")
    +      .set("spark.dynamicAllocation.enabled", "true")
    +      .set("spark.dynamicAllocation.testing", "true")
    +      .set("spark.dynamicAllocation.maxExecutors", "15")
    +      .set("spark.dynamicAllocation.minExecutors", "3")
    +      .set("spark.dynamicAllocation.fullExecutorAllocationDivisor", divisor.toString)
    +      .set("spark.executor.cores", cores.toString)
    +    val sc = new SparkContext(conf)
    +    contexts += sc
    +    var manager = sc.executorAllocationManager.get
    +    post(sc.listenerBus, SparkListenerStageSubmitted(createStageInfo(0, 20)))
    +    for (i <- 0 to 5) {
    +      addExecutors(manager)
    --- End diff --
    
    this loop isn't really needed right? All we are checking is the target not the number to add?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Thanks @jcuquemelle 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r177452303
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #88180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88180/testReport)** for PR 19881 at commit [`56c3f43`](https://github.com/apache/spark/commit/56c3f43bff4f0993b134497f0792d4493ea22afc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #88472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88472/testReport)** for PR 19881 at commit [`a40d160`](https://github.com/apache/spark/commit/a40d16031a4618db4023a3cf432b4d8f37f70b7d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    jenkins, test this please
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a executorAllocationRatio parame...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r176328345
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
    --- End diff --
    
    We don't really need this variable now, can we just remove it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r177444324
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
    --- End diff --
    
    This is not exposed, it is merely a more precise description of the actual computation. I just wanted to state more clearly that the existing default behavior is maximizing the parallelism


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r174130728
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    +  <td>1</td>
    +  <td>
    +    By default, the dynamic allocation will request enough executors to maximize the 
    +    parallelism according to the number of tasks to process. While this minimizes the 
    +    latency of the job, with small tasks this setting wastes a lot of resources due to
    +    executor allocation overhead, as some executor might not even do any work.
    +    This setting allows to set a divisor that will be used to reduce the number of
    +    executors w.r.t. full parallelism
    --- End diff --
    
    add period at end of parallelism


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    @rxin , can we merge this PR ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r174125381
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    +  <td>1</td>
    +  <td>
    +    By default, the dynamic allocation will request enough executors to maximize the 
    +    parallelism according to the number of tasks to process. While this minimizes the 
    +    latency of the job, with small tasks this setting wastes a lot of resources due to
    --- End diff --
    
    can waste.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r175835570
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    --- End diff --
    
    sorry didn't get back to this earlier, I think fullExecutorAllocationDivisor would be fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r176036965
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    I thought about this more, and I actually think something like this makes more sense: `executorAllocationRatio`. Basically it is just a ratio that determines how aggressive we want Spark to request full executors. Ratio of 1.0 means fill up everything. Ratio of 0.5 means only request half of the executors.
    
    What do you think?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    jenkins, test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a executorAllocationRatio parame...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Yes we can wait another day or so if you are looking at it, this discussion has been going on for a long time now though, if you have a better name suggestion let us know.  No other configs have "divisor" suffix.s 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r176365454
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
    --- End diff --
    
    it is used at 2 places, one to validate arguments and the other to actually compute the target number of executors. If I remove this variable, I will need to either store spark.executor.cores and spark.task.cpus instead, or to fetch them each time we do a validation or a computation of target nbExecutors


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r177362439
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
    --- End diff --
    
    I was originally thinking we may avoid introducing the concept `tasksPerExecutorForFullParallelism`, but rather only have executorCores and taskCPUs, but I don't have a strong opinion over that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    cc @rxin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r166649526
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -114,9 +114,13 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val taskSlotPerExecutor =
         conf.getInt("spark.executor.cores", 1) / conf.getInt("spark.task.cpus", 1)
     
    +  private val tasksPerExecutorSlot = conf.getInt("spark.dynamicAllocation.tasksPerExecutorSlot", 1)
    --- End diff --
    
    I think we should change the name of this config because spark doesn't have the concept of slots and I think it could be confusing to the users who might expect exactly x tasks to be processed on each executor. I am thinking more along the lines of spark.dynamicAllocation.maxExecutorsPerStageDivisor=max # of executors based on  # of tasks required for that stage divided by this number.   I'm open to other config names here though.
    
    I think we would also need to define its interaction with spark.dynamicAllocation.maxExecutors as well as how it works as # of running/to be run tasks changes. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a executorAllocationRatio...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r182521349
  
    --- Diff: docs/configuration.md ---
    @@ -1751,6 +1751,7 @@ Apart from these, the following properties are also available, and may be useful
         <code>spark.dynamicAllocation.minExecutors</code>,
         <code>spark.dynamicAllocation.maxExecutors</code>, and
         <code>spark.dynamicAllocation.initialExecutors</code>
    +    <code>spark.dynamicAllocation.fullExecutorAllocationDivisor</code>
    --- End diff --
    
    needs changed to executorAllocationRatio


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r175831139
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    --- End diff --
    
    How about something like fullAllocationDivisor ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a executorAllocationRatio parame...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #89712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89712/testReport)** for PR 19881 at commit [`3b1dddc`](https://github.com/apache/spark/commit/3b1dddcfa08ae63af7e43a1181f2f1e709447a07).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r174145101
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    --- End diff --
    
    Naming configs is really hard and lots of different opinions on it and in the end someone is going to be confused,  I need to think about this some more.  I see the reason to use Parallelism here rather then maxExecutors  (maxExecutorsDivisor -  could be confusing if people think it applies to the maxExecutors config), but I also think parallelism would be confused with the parallelism in the spark.default.parallelism, its not defining number of tasks but number of executors to allocate based on the parallelism.  Another one I thought of is executorAllocationDivisor.  I'll think about it some more and get back.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r177176666
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -116,9 +120,12 @@ private[spark] class ExecutorAllocationManager(
       // TODO: The default value of 1 for spark.executor.cores works right now because dynamic
       // allocation is only supported for YARN and the default number of cores per executor in YARN is
       // 1, but it might need to be attained differently for different cluster managers
    -  private val tasksPerExecutor =
    +  private val tasksPerExecutorForFullParallelism =
    --- End diff --
    
    @jiangxb1987, do you agree with my comment, or do you still want me to remove the variable ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r175852881
  
    --- Diff: docs/configuration.md ---
    @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful
         Lower bound for the number of executors if dynamic allocation is enabled.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.dynamicAllocation.fullParallelismDivisor</code></td>
    +  <td>1</td>
    +  <td>
    +    By default, the dynamic allocation will request enough executors to maximize the 
    +    parallelism according to the number of tasks to process. While this minimizes the 
    +    latency of the job, with small tasks this setting wastes a lot of resources due to
    +    executor allocation overhead, as some executor might not even do any work.
    +    This setting allows to set a divisor that will be used to reduce the number of
    +    executors w.r.t. full parallelism
    +    Defaults to 1.0
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #89467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89467/testReport)** for PR 19881 at commit [`15732ab`](https://github.com/apache/spark/commit/15732ab7ee22a9cc4409b36812b5b2c930854723).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationD...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r176727530
  
    --- Diff: core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala ---
    @@ -145,6 +145,39 @@ class ExecutorAllocationManagerSuite
         assert(numExecutorsToAdd(manager) === 1)
       }
     
    +  def testParallelismDivisor(cores: Int, divisor: Double, expected: Int): Unit = {
    +    val conf = new SparkConf()
    +      .setMaster("myDummyLocalExternalClusterManager")
    +      .setAppName("test-executor-allocation-manager")
    +      .set("spark.dynamicAllocation.enabled", "true")
    +      .set("spark.dynamicAllocation.testing", "true")
    +      .set("spark.dynamicAllocation.maxExecutors", "15")
    +      .set("spark.dynamicAllocation.minExecutors", "3")
    +      .set("spark.dynamicAllocation.fullExecutorAllocationDivisor", divisor.toString)
    +      .set("spark.executor.cores", cores.toString)
    +    val sc = new SparkContext(conf)
    +    contexts += sc
    +    var manager = sc.executorAllocationManager.get
    +    post(sc.listenerBus, SparkListenerStageSubmitted(createStageInfo(0, 20)))
    +    for (i <- 0 to 5) {
    +      addExecutors(manager)
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    @jcuquemelle  please fix the style


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88472/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    I'm fine with that


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Ok, will quickly do the change
    thanks for the proposals


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    @felixcheung: updated PR title and description


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    **[Test build #89467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89467/testReport)** for PR 19881 at commit [`15732ab`](https://github.com/apache/spark/commit/15732ab7ee22a9cc4409b36812b5b2c930854723).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a executorAllocationRatio...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r182520622
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -26,7 +26,10 @@ import scala.util.control.{ControlThrowable, NonFatal}
     import com.codahale.metrics.{Gauge, MetricRegistry}
     
     import org.apache.spark.internal.Logging
    -import org.apache.spark.internal.config.{DYN_ALLOCATION_MAX_EXECUTORS, DYN_ALLOCATION_MIN_EXECUTORS}
    +import org.apache.spark.internal.config.{
    --- End diff --
    
    I would just make this import org.apache.spark.internal.config._


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Sorry, I didn't see the ping, I will have a look shortly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88180/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a fullExecutorAllocationDivisor ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    I'll leave this a bit longer but then I'm going to merge it later today


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add a executorAllocationRatio parame...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89712/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add a executorAllocationRatio...

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19881#discussion_r183337131
  
    --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -26,7 +26,10 @@ import scala.util.control.{ControlThrowable, NonFatal}
     import com.codahale.metrics.{Gauge, MetricRegistry}
     
     import org.apache.spark.internal.Logging
    -import org.apache.spark.internal.config.{DYN_ALLOCATION_MAX_EXECUTORS, DYN_ALLOCATION_MIN_EXECUTORS}
    +import org.apache.spark.internal.config.{
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot parameter

Posted by jcuquemelle <gi...@git.apache.org>.
Github user jcuquemelle commented on the issue:

    https://github.com/apache/spark/pull/19881
  
    The new semantics (throttling w.r.t max possible parallelism) is actually simpler to understand. I'm proposing another name which doesn't have any ambiguity with the existing maxExecutors param, but I'm open to any other name proposal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org