You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by andrewor14 <gi...@git.apache.org> on 2017/02/17 16:53:36 UTC

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/16975

    [SPARK-19522] Fix executor memory in local-cluster mode

    ## What changes were proposed in this pull request?
    
    ```
    bin/spark-shell --master local-cluster[2,1,2048]
    ```
    is supposed to launch 2 executors, each with 2GB of memory. However, when I ran this in master, I only get executors with 1GB memory. This patch fixes this problem.
    
    ## How was this patch tested?
    
    `SparkSubmitSuite`, manual tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark local-cluster-executor-mem

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16975.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16975
    
----
commit db3773c8aeb44e69610c17ff5bc169d7085333ee
Author: Andrew Or <an...@gmail.com>
Date:   2017-02-08T23:27:07Z

    Fix propagation of executor memory in local-cluster mode

commit 1a5bdfeca931a624b82933772c57c605e02dc58d
Author: Andrew Or <an...@gmail.com>
Date:   2017-02-17T16:47:44Z

    Log warning if memory is explicitly set

commit b1a13dc3b1a59593945f0fb6d13683ed81acf9b7
Author: Andrew Or <an...@gmail.com>
Date:   2017-02-17T16:48:16Z

    Merge branch 'master' of github.com:apache/spark into local-cluster-executor-mem

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101851064
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    Is the change in `SparkContext` needed? Seems like this should be all that's needed.
    
    As far as I understand, the last value in the local-cluster master is the amount of memory the worker has available; you may, for whatever reason, want to run executors with less than that, which your change doesn't seem to allow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101887589
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    The inconsistency is already inherent with the parameters in `local-cluster[]`, so I'm not introducing it here with this change. I personally think it's a really bad interface to force the user set executor memory in two different places and require that these two values match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16975: [SPARK-19522] Fix executor memory in local-cluster mode

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16975
  
    **[Test build #73058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73058/testReport)** for PR 16975 at commit [`b1a13dc`](https://github.com/apache/spark/commit/b1a13dc3b1a59593945f0fb6d13683ed81acf9b7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16975: [SPARK-19522] Fix executor memory in local-cluster mode

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16975
  
    Hi @andrewor14, is this still active?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r102337722
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -470,12 +470,25 @@ class SparkContext(config: SparkConf) extends Logging {
           files.foreach(addFile)
         }
     
    -    _executorMemory = _conf.getOption("spark.executor.memory")
    -      .orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY")))
    -      .orElse(Option(System.getenv("SPARK_MEM"))
    -      .map(warnSparkMem))
    -      .map(Utils.memoryStringToMb)
    -      .getOrElse(1024)
    +    _executorMemory = {
    +      val defaultMemory = 1024
    +      val configuredMemory = _conf.getOption("spark.executor.memory")
    +        .orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY")))
    +        .orElse(Option(System.getenv("SPARK_MEM")).map(warnSparkMem))
    +        .map(Utils.memoryStringToMb)
    +      // In local-cluster mode, always use the slave memory specified in the master string
    +      // In other modes, use the configured memory if it exists
    +      master match {
    +        case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, em) =>
    +          if (configuredMemory.isDefined) {
    --- End diff --
    
    Could you at least change this so that `spark.executor.memory` takes precedence if it's set? Then both use cases are possible. (Maybe someone is crazy enough to be trying dynamic allocation in local-cluster mode, or something else...)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16975: [SPARK-19522] Fix executor memory in local-cluster mode

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16975
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73058/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101857385
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    ```
    You may, for whatever reason, want to run executors with less than that, which your change doesn't seem to allow.
    ```
    Yeah, I thought about this long and hard but I just couldn't come up with a case where you would possibly want the worker size to be different from executor size in local-cluster mode. If you want two launch 5 workers (2GB), each with 2 executors (1GB), then you might as well just launch 10 executors (1GB) or run real standalone mode locally. I think it's better to fix the out-of-the-box case than to try to cover all potentially non-existent corner cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101887609
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    also we're talking about a net addition of 7 LOC in `SparkContext.scala`, about half of which are comments and warning logs. It's really not that much code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16975


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16975: [SPARK-19522] Fix executor memory in local-cluster mode

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16975
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101857230
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    If this was the only change then specifying `local-cluster[2,1,2048]` doesn't actually do anything because we're not setting `spark.executor.memory` anywhere. You could do `--master local-cluster[2,1,2048] --conf spark.executor.memory=2g` but that's cumbersome and now there are two ways to set the executor memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101860376
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    Well, it would make `local-cluster[]` work like any other master, where you have to explicitly set the executor memory. I understand the desire to simplify things, but this is doing it at the cost of being inconsistent with other cluster managers.
    
    (e.g. the same command line with a different master would behave differently - you'd fall back to having 1g of memory for executors instead of whatever was defined in the `local-cluster` string.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r101862197
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -466,7 +466,7 @@ object SparkSubmit extends CommandLineUtils {
           // Other options
           OptionAssigner(args.executorCores, STANDALONE | YARN, ALL_DEPLOY_MODES,
             sysProp = "spark.executor.cores"),
    -      OptionAssigner(args.executorMemory, STANDALONE | MESOS | YARN, ALL_DEPLOY_MODES,
    +      OptionAssigner(args.executorMemory, ALL_CLUSTER_MGRS, ALL_DEPLOY_MODES,
    --- End diff --
    
    (Anyway, either way is probably fine, so go with your judgement. It just seems like a lot of code in SparkContext just to support that use case.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16975: [SPARK-19522] Fix executor memory in local-cluste...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16975#discussion_r102569401
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -470,12 +470,25 @@ class SparkContext(config: SparkConf) extends Logging {
           files.foreach(addFile)
         }
     
    -    _executorMemory = _conf.getOption("spark.executor.memory")
    -      .orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY")))
    -      .orElse(Option(System.getenv("SPARK_MEM"))
    -      .map(warnSparkMem))
    -      .map(Utils.memoryStringToMb)
    -      .getOrElse(1024)
    +    _executorMemory = {
    +      val defaultMemory = 1024
    +      val configuredMemory = _conf.getOption("spark.executor.memory")
    +        .orElse(Option(System.getenv("SPARK_EXECUTOR_MEMORY")))
    +        .orElse(Option(System.getenv("SPARK_MEM")).map(warnSparkMem))
    +        .map(Utils.memoryStringToMb)
    +      // In local-cluster mode, always use the slave memory specified in the master string
    +      // In other modes, use the configured memory if it exists
    +      master match {
    +        case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, em) =>
    +          if (configuredMemory.isDefined) {
    --- End diff --
    
    sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16975: [SPARK-19522] Fix executor memory in local-cluster mode

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16975
  
    **[Test build #73058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73058/testReport)** for PR 16975 at commit [`b1a13dc`](https://github.com/apache/spark/commit/b1a13dc3b1a59593945f0fb6d13683ed81acf9b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org