You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2014/12/18 07:53:05 UTC

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/3731

    [SPARK-4140] Document dynamic allocation

    Once the external shuffle service is also documented, the dynamic allocation section will link to it. Let me know if the whole dynamic allocation should be moved to its separate page; I personally think the organization might be cleaner that way.
    
    This patch builds on top of @oza's work in #3689.
    
    @aarondav @pwendell

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark document-dynamic-allocation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3731.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3731
    
----
commit 53cff5840923af81e048c0333117dc48163e4408
Author: Tsuyoshi Ozawa <oz...@gmail.com>
Date:   2014-12-13T07:34:25Z

    Adding a documentation about dynamic resource allocation.
    
    Signed-off-by: Tsuyoshi Ozawa <oz...@gmail.com>

commit 6827b56e37e25d64ce52af1caad742563c2bdb40
Author: Tsuyoshi Ozawa <oz...@gmail.com>
Date:   2014-12-16T17:14:15Z

    Fixing a documentation of spark.dynamicAllocation.enabled.
    
    Signed-off-by: Tsuyoshi Ozawa <oz...@gmail.com>

commit 8c64004029ece1700fb5930ba1e9ddc19c1c3bd1
Author: Andrew Or <an...@databricks.com>
Date:   2014-12-18T06:35:38Z

    Add documentation for dynamic allocation (without configs)

commit 246fb44a6f508178a8730d4c3095237bb66cfc33
Author: Andrew Or <an...@databricks.com>
Date:   2014-12-18T06:36:30Z

    Merge branch 'SPARK-4839' of github.com:oza/spark into document-dynamic-allocation

commit b9843f2c673f30c5111f2a2a29e15dcde00042db
Author: Andrew Or <an...@databricks.com>
Date:   2014-12-18T06:50:25Z

    Document the configs as well

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22062228

--- Diff: docs/job-scheduling.md ---
@@ -56,6 +56,112 @@ the same RDDs. For example, the [Shark](http://shark.cs.berkeley.edu) JDBC serve
queries. In future releases, in-memory storage systems such as [Tachyon](http://tachyon-project.org) will
provide another approach to share RDDs.

This is just a reference to the external shuffle service. The actual documentation for the service will be in its own section elsewhere outside of this patch.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22044912

Nit: I think should be "At a high level" or "From a high level"

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67723416
  
    Ok I'm merging this into master and 1.2 thanks guys


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22044723

I would possibly rephrase or leave this paragraph out, as there are situations where different dynamicAllocation.enabled settings for different applications are reasonable. I.e. a cluster might have some production applications that need a static allocation to cache data and respond to queries as fast as possible, while others might be interactive and have highly varying resource use. YARN is meant to take care of the fairness aspect.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-68657238
  
    @pwendell @andrewor14 Can we publish updated docs to the website?  It looks like these changes aren't live yet: https://spark.apache.org/docs/latest/job-scheduling.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22044849

Should this be broken out into a separate section for users that don't care about dynamic allocation, but want to learn how to use the external shuffle service?

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22045044

+## Dynamic Resource Allocation
+
+Spark 1.2 introduces the ability to dynamically scale the set of cluster resources allocated to
+your application up and down based on the workload. This means that your application may give
+resources back to the cluster if they are no longer used and request them again later when there
+is demand. This feature is particularly useful if multiple applications share resources in your
+Spark cluster. If a subset of the resources allocated to an application becomes idle, it can be
+returned to the cluster's pool of resources and acquired by other applications. In Spark, dynamic
+resource allocation is performed on the granularity of the executor and can be enabled through
+`spark.dynamicAllocation.enabled`.
+
+This feature is currently disabled by default and available only on [YARN](running-on-yarn.html).
+A future release will extend this to [standalone mode](spark-standalone.html) and
+[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes). Note that although Spark on
+Mesos already has a similar notion of dynamic resource sharing in fine-grained mode, enabling
+dynamic allocation allows your Mesos application to take advantage of coarse-grained low-latency
+scheduling while sharing cluster resources efficiently.
+
+Lastly, it is worth noting that Spark's dynamic resource allocation mechanism is cooperative.
+This means if a Spark application enables this feature, other applications on the same cluster
+are also expected to do so. Otherwise, the cluster's resources will end up being unfairly
+distributed to the applications that do not voluntarily give up unused resources they have
+acquired.
+
+### Configuration and Setup
+
+All configurations used by this feature live under the `spark.dynamicAllocation.*` namespace.
+To enable this feature, your application must set `spark.dynamicAllocation.enabled` to `true` and
+provide lower and upper bounds for the number of executors through
+`spark.dynamicAllocation.minExecutors` and `spark.dynamicAllocation.maxExecutors`. Other relevant
+configurations are described on the [configurations page](configuration.html#dynamic-allocation)
+and in the subsequent sections in detail.
+
+Additionally, your application must use an external shuffle service (described below). To enable
+this, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service is
+implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager`
+in your cluster. To start this service, follow these steps:
+
+1. Build Spark with the [YARN profile](building-spark.html). Skip this step if you are using a
+pre-packaged distribution.
+2. Locate the `spark-<version>-yarn-shuffle.jar`. This should be under
+`$SPARK_HOME/network/yarn/target/scala-<version>` if you are building Spark yourself, and under
+`lib` if you are using a distribution.
+2. Add this jar to the classpath of all `NodeManager`s in your cluster.
+3. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services`,
+then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
+`org.apache.spark.yarn.network.YarnShuffleService`. Additionally, set all relevant
+`spark.shuffle.service.*` [configurations](configuration.html).
+4. Restart all `NodeManager`s in your cluster.
+
+### Resource Allocation Policy
+
+On a high level, Spark should relinquish executors when they are no longer used and acquire
+executors when they are needed. Since there is no definitive way to predict whether an executor
+that is about to be removed will run a task in the near future, or whether a new executor that is
+about to be added will actually be idle, we need a set of heuristics to determine when to remove
+and request executors.
+
+#### Request Policy
+
+A Spark application with dynamic allocation enabled requests additional executors when it has
+pending tasks waiting to be scheduled. This condition necessarily implies that the existing set
+of executors is insufficient to simultaneously saturate all tasks that have been submitted but
+not yet finished.
+
+Spark requests executors in rounds. The actual request is triggered when there have been pending
+tasks for `spark.dynamicAllocation.schedulerBacklogTimeout` seconds, and then triggered again
+every `spark.dynamicAllocation.sustainedSchedulerBacklogTimeout` seconds thereafter if the queue
+of pending tasks persists. Additionally, the number of executors requested in each round increases
+exponentially from the previous round. For instance, an application will add 1 executor in the
+first round, and then 2, 4, 8 and so on executors in the subsequent rounds.
+
+The motivation for an exponential increase policy is twofold. First, an application should request
+executors cautiously in the beginning in case it turns out that only a few additional executors is
+sufficient. This echoes the justification for TCP slow start. Second, the application should be
+able to ramp up its resource usage in a timely manner in case it turns out that many executors are
+actually needed.
+
+#### Remove Policy
+
+The policy for removing executors is much simpler. A Spark application removes an executor when
+it has been idle for more than `spark.dynamicAllocation.executorIdleTimeout` seconds. Note that,
+under most circumstances, this condition is mutually exclusive with the request condition, in that
+an executor should not be idle if there are still pending tasks to be scheduled.
+
+### Graceful Decommission of Executors
--- End diff --

This section should mention issues with caching data.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67454633
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24575/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22090703

Are you saying this section will be replaced with a pointer to the external shuffle service doc once it's added? If so, looks good to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67448076
  
      [Test build #24575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24575/consoleFull) for   PR 3731 at commit [`b9843f2`](https://github.com/apache/spark/commit/b9843f2c673f30c5111f2a2a29e15dcde00042db).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67494432
  
    Super nice to have documentation at this level of detail.  This mostly looks good, I left a few comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67724988
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24663/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3731


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22139447

you're right

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67723167
  
      [Test build #24663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24663/consoleFull) for   PR 3731 at commit [`1281447`](https://github.com/apache/spark/commit/1281447a71f87e1a45b7a8b2ef7faabc3d300fc1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by oza <gi...@git.apache.org>.

Github user oza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22081115

Make sense to me. How about you, @sryza?

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22139448

fair enough

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67724986
  
      [Test build #24663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24663/consoleFull) for   PR 3731 at commit [`1281447`](https://github.com/apache/spark/commit/1281447a71f87e1a45b7a8b2ef7faabc3d300fc1).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by oza <gi...@git.apache.org>.

Github user oza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22055469

+1 to add how to use external shuffle service since we need to enable external shuffle service to use dynamic allocation.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22044330

It would be nice to add a short clause explaining why this is the case

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by oza <gi...@git.apache.org>.

Github user oza commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67725428
  
    @andrewor14 good job, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3731#discussion_r22044851

Should this be broken out into a separate section for users that don't care about dynamic allocation, but want to learn how to use the external shuffle service?

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4140] Document dynamic allocation

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3731#issuecomment-67454627
  
      [Test build #24575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24575/consoleFull) for   PR 3731 at commit [`b9843f2`](https://github.com/apache/spark/commit/b9843f2c673f30c5111f2a2a29e15dcde00042db).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org