You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by aarondav <gi...@git.apache.org> on 2014/02/18 21:09:13 UTC

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

GitHub user aarondav opened a pull request:

    https://github.com/apache/incubator-spark/pull/615

    SPARK-929: Fully deprecate usage of SPARK_MEM

    This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
    SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEM, and SPARK_DRIVER_MEM
    
    The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory.
    SPARK_EXECUTOR_MEM is not actually public -- it is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.
    
    SPARK_DRIVER_MEM is a new public variable, which is needed because there is currently no public way of setting the memory of jobs launched by spark-class.
    
    Other memory considerations:
    - The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEM.
    - run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).
    
    This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aarondav/incubator-spark sparkmem

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/615.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #615
    
----
commit c6ff53251d49c6e4033d6b56268d2f3fb0166d88
Author: Aaron Davidson <aa...@databricks.com>
Date:   2014-02-17T23:09:51Z

    SPARK-929: Fully deprecate usage of SPARK_MEM
    
    This patch cements our deprecation of the SPARK_MEM environment variable
    by replacing it with case-specific variables:
    SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEM, and SPARK_DRIVER_MEM
    
    The creation of the latter two variables means that we can safely
    set driver/job memory without accidentally setting the executor memory.
    SPARK_EXECUTOR_MEM is not actually public, though -- it is only used
    by the Mesos scheduler (and set within SparkContext). The proper way of
    configuring executor memory is through the "spark.executor.memory"
    property.
    
    SPARK_DRIVER_MEM is a new public variable, which is needed because
    there is currently no public way of setting the memory of jobs
    launched by spark-class.
    
    Other memory considerations:
    - The repl's memory can be set through the "--drivermem" command-line option,
      which really just sets SPARK_DRIVER_MEM.
    - run-example doesn't use spark-class, so the only way to modify examples'
      memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally
      overriden in all cases by spark-class).
    
    This patch also fixes a lurking bug where spark-shell misused spark-class
    (the first argument is supposed to be the main class name, not java
    options).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35820624
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12815/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35684474
  
    I think SPARK_CLIENT_MEMORY isn't so hot either because most often `spark-class` isn't used to run a client, it's most often used by users to run examples. Maybe SPARK_CLASS_MEMORY? @asfgit 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35572922
  
    Will this nomenclature make sense in the context of yarn-standalone mode, where spark-class is used, but the driver is run inside an application master on the cluster, not inside the client?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35464406
  
    Changed to be SPARK_DRIVER_MEMORY


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35684291
  
    Right. What I mean is that calling the variable SPARK_DRIVER_MEMORY might be confusing in the context of yarn-standalone because its value would apply to the client and not the driver (if that's the right terminology).  Would SPARK_CLIENT_MEMORY possibly make more sense?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35464374
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35427919
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35466551
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35581271
  
    @sryza - I don't think this is relevant to the YARN codepath. AFAIK YARN doesn't use the ./spark-class script to launch the YARN application master (which embeds the driver program). I'm not totally sure how that JVM is actually launched though... couldn't figure it out on a quick glance at that code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35581371
  
    It looks like there is a separate variable called `amMemory` that deals with this in YARN. The command for launching that JVM gets set-up in:
    
        common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35464375
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35820623
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35466553
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12773/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35818768
  
    Updated to make de-publicize SPARK_DRIVER_MEMORY. `spark-class` is currently only used in the following cases:
    - spark-shell (which has a memory command-line option that under the hood uses SPARK_DRIVER_MEMORY)
    - developer examples (e.g., StoragePerfTester and FaultToleranceTest; probably do not need memory configuration)
    - launching Mesos executors (SPARK_EXECUTOR_MEMORY is used here)
    - launching standalone workers and masters (SPARK_MASTER/WORKER_MEMORY is used here)
    - and starting YARN and Standalone "Clients" (memory probably does not need to be set here, as the Client just sends a request to start a driver within the cluster, and driver memory is set via command line options)
    
    The last point is the most arguable, if for some reason YARN or Standalone Clients may need more than 512 MB. We could always introduce a public SPARK_CLIENT_MEMORY that is used only when launching YARN and Standalone Clients, but I am not sure it's necessary. Otherwise, all of our cases seem covered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35433388
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12767/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/615#discussion_r9891770
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -165,19 +165,20 @@ class SparkContext(
         jars.foreach(addJar)
       }
     
    +  def warnSparkMem(value: String): String = {
    +    logWarning("Using SPARK_MEM to set amount of memory to use per executor process is " +
    +      "deprecated, please use instead spark.executor.memory")
    --- End diff --
    
    Small nit of the warning wording:
    
    "deprecated, please use spark.executor.memory instead."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35819593
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35684281
  
    Right. What I mean is that calling the variable SPARK_DRIVER_MEMORY might be confusing in the context of yarn-standalone because its value would apply to the client and not the driver (if that's the right terminology).  Would SPARK_CLIENT_MEMORY possibly make more sense?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35819591
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35433387
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35433301
  
    I'm amenable to switching to SPARK_DRIVER_MEMORY. I avoided it for two reasons:
    1) Following SPARK_MEM -> SPARK_DRIVER_MEM / SPARK_EXECUTOR_MEM
    2) It's shorter.
    
    However, since SPARK_MEM is supposed to never have existed, and being shorter isn't really a good argument, we can go with the version that's more consistent with the public variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35684716
  
    Hm actually sorry that was totally wrong. Who uses this script externally at all? Why don't we just _not_ document this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35427920
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35432894
  
    Why use MEMORY for the daemon, but MEM for the driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-929: Fully deprecate usage of ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/incubator-spark/pull/615#issuecomment-35684293
  
    Right. What I mean is that calling the variable SPARK_DRIVER_MEMORY might be confusing in the context of yarn-standalone because its value would apply to the client and not the driver (if that's the right terminology).  Would SPARK_CLIENT_MEMORY possibly make more sense?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---