You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2015/11/11 22:36:06 UTC

[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/9637

    [SPARK-11667] Update dynamic allocation docs to reflect supported cluster managers

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark update-da-docs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9637.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9637
    
----
commit 48d741852a567aeb0806cfbc23b094c5edab3ba9
Author: Andrew Or <an...@databricks.com>
Date:   2015-11-11T21:35:10Z

    Update dynamic allocation docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156226760
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156244521
  
    I think this is fine for now, I was thinking leaving some comments about how it should be launched with Marathon but I think later we can add a example json in the Marathon repo that we can link from here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155922803
  
    **[Test build #45674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45674/consoleFull)** for PR 9637 at commit [`48d7418`](https://github.com/apache/spark/commit/48d741852a567aeb0806cfbc23b094c5edab3ba9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155917580
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155923948
  
    Only have one comment, otherwise everything else LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9637#discussion_r44710758
  
    --- Diff: docs/job-scheduling.md ---
    @@ -56,36 +56,31 @@ provide another approach to share RDDs.
     
     ## Dynamic Resource Allocation
     
    -Spark 1.2 introduces the ability to dynamically scale the set of cluster resources allocated to
    -your application up and down based on the workload. This means that your application may give
    -resources back to the cluster if they are no longer used and request them again later when there
    -is demand. This feature is particularly useful if multiple applications share resources in your
    -Spark cluster. If a subset of the resources allocated to an application becomes idle, it can be
    -returned to the cluster's pool of resources and acquired by other applications. In Spark, dynamic
    -resource allocation is performed on the granularity of the executor and can be enabled through
    -`spark.dynamicAllocation.enabled`.
    -
    -This feature is currently disabled by default and available only on [YARN](running-on-yarn.html).
    -A future release will extend this to [standalone mode](spark-standalone.html) and
    -[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes). Note that although Spark on
    -Mesos already has a similar notion of dynamic resource sharing in fine-grained mode, enabling
    -dynamic allocation allows your Mesos application to take advantage of coarse-grained low-latency
    -scheduling while sharing cluster resources efficiently.
    +Spark provides a mechanism to dynamically adjust the resources your application occupies based
    +on the workload. This means that your application may give resources back to the cluster if they
    +are no longer used and request them again later when there is demand. This feature is particularly
    +useful if multiple applications share resources in your Spark cluster.
    +
    +This feature is disabled by default and available on all coarse-grained cluster managers, i.e.
    +[standalone mode](spark-standalone.html), [YARN mode](running-on-yarn.html), and
    +[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes).
     
     ### Configuration and Setup
     
    -All configurations used by this feature live under the `spark.dynamicAllocation.*` namespace.
    -To enable this feature, your application must set `spark.dynamicAllocation.enabled` to `true`.
    -Other relevant configurations are described on the
    -[configurations page](configuration.html#dynamic-allocation) and in the subsequent sections in
    -detail.
    +There are two requirements for using this feature. First, your application must set
    +`spark.dynamicAllocation.enabled` to `true`. Second, you must set up an *external shuffle service*
    +on each worker node in the same cluster and set `spark.shuffle.service.enabled` to true in your
    +application. The purpose of the external shuffle service is to allow executors to be removed
    +without deleting shuffle files written by them (more detail described
    +[below](job-scheduling.html#graceful-decommission-of-executors)). The way to set up this service
    +varies across cluster managers:
    +
    +In standalone mode, simply start your workers with `spark.shuffle.service.enabled` set to `true`.
     
    -Additionally, your application must use an external shuffle service. The purpose of the service is
    -to preserve the shuffle files written by executors so the executors can be safely removed (more
    -detail described [below](job-scheduling.html#graceful-decommission-of-executors)). To enable
    -this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service
    -is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager`
    -in your cluster. To start this service, follow these steps:
    +In Mesos coarse-grained mode, run `$SPARK_HOME/sbin/start-mesos-shuffle-service.sh` on all
    +slave nodes with `spark.shuffle.service.enabled` set to `true`.
    --- End diff --
    
    not sure what you mean about the latter part. If you wish you can submit a patch against the same issue to augment the description here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155918259
  
    **[Test build #45674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45674/consoleFull)** for PR 9637 at commit [`48d7418`](https://github.com/apache/spark/commit/48d741852a567aeb0806cfbc23b094c5edab3ba9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156233387
  
    **[Test build #45767 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45767/consoleFull)** for PR 9637 at commit [`8200bf5`](https://github.com/apache/spark/commit/8200bf50175e069bbe96003099a5391b6705eb69).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156273469
  
    Thanks Tim, merging into master 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155917848
  
    @tnachen please verify whether the mesos part is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9637


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156233546
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156226729
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156229506
  
    **[Test build #45767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45767/consoleFull)** for PR 9637 at commit [`8200bf5`](https://github.com/apache/spark/commit/8200bf50175e069bbe96003099a5391b6705eb69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155922989
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156233547
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45767/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-156244571
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155917561
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9637#issuecomment-155922992
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45674/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11667] Update dynamic allocation docs t...

Posted by tnachen <gi...@git.apache.org>.
Github user tnachen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9637#discussion_r44593380
  
    --- Diff: docs/job-scheduling.md ---
    @@ -56,36 +56,31 @@ provide another approach to share RDDs.
     
     ## Dynamic Resource Allocation
     
    -Spark 1.2 introduces the ability to dynamically scale the set of cluster resources allocated to
    -your application up and down based on the workload. This means that your application may give
    -resources back to the cluster if they are no longer used and request them again later when there
    -is demand. This feature is particularly useful if multiple applications share resources in your
    -Spark cluster. If a subset of the resources allocated to an application becomes idle, it can be
    -returned to the cluster's pool of resources and acquired by other applications. In Spark, dynamic
    -resource allocation is performed on the granularity of the executor and can be enabled through
    -`spark.dynamicAllocation.enabled`.
    -
    -This feature is currently disabled by default and available only on [YARN](running-on-yarn.html).
    -A future release will extend this to [standalone mode](spark-standalone.html) and
    -[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes). Note that although Spark on
    -Mesos already has a similar notion of dynamic resource sharing in fine-grained mode, enabling
    -dynamic allocation allows your Mesos application to take advantage of coarse-grained low-latency
    -scheduling while sharing cluster resources efficiently.
    +Spark provides a mechanism to dynamically adjust the resources your application occupies based
    +on the workload. This means that your application may give resources back to the cluster if they
    +are no longer used and request them again later when there is demand. This feature is particularly
    +useful if multiple applications share resources in your Spark cluster.
    +
    +This feature is disabled by default and available on all coarse-grained cluster managers, i.e.
    +[standalone mode](spark-standalone.html), [YARN mode](running-on-yarn.html), and
    +[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes).
     
     ### Configuration and Setup
     
    -All configurations used by this feature live under the `spark.dynamicAllocation.*` namespace.
    -To enable this feature, your application must set `spark.dynamicAllocation.enabled` to `true`.
    -Other relevant configurations are described on the
    -[configurations page](configuration.html#dynamic-allocation) and in the subsequent sections in
    -detail.
    +There are two requirements for using this feature. First, your application must set
    +`spark.dynamicAllocation.enabled` to `true`. Second, you must set up an *external shuffle service*
    +on each worker node in the same cluster and set `spark.shuffle.service.enabled` to true in your
    +application. The purpose of the external shuffle service is to allow executors to be removed
    +without deleting shuffle files written by them (more detail described
    +[below](job-scheduling.html#graceful-decommission-of-executors)). The way to set up this service
    +varies across cluster managers:
    +
    +In standalone mode, simply start your workers with `spark.shuffle.service.enabled` set to `true`.
     
    -Additionally, your application must use an external shuffle service. The purpose of the service is
    -to preserve the shuffle files written by executors so the executors can be safely removed (more
    -detail described [below](job-scheduling.html#graceful-decommission-of-executors)). To enable
    -this service, set `spark.shuffle.service.enabled` to `true`. In YARN, this external shuffle service
    -is implemented in `org.apache.spark.yarn.network.YarnShuffleService` that runs in each `NodeManager`
    -in your cluster. To start this service, follow these steps:
    +In Mesos coarse-grained mode, run `$SPARK_HOME/sbin/start-mesos-shuffle-service.sh` on all
    +slave nodes with `spark.shuffle.service.enabled` set to `true`.
    --- End diff --
    
    I'd like to add that users can run the mesos-shuffle-service.sh with Marathon, and they should start the service in the foreground running `spark-class org.apache.spark.deploy.mesos.MesosExternalShuffleService`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org