You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by CodingCat <gi...@git.apache.org> on 2014/02/14 03:04:46 UTC

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

GitHub user CodingCat opened a pull request:

    https://github.com/apache/incubator-spark/pull/599

    [SPARK-1090] improvement on spark_shell (help information, configure memory)

    spark-shell should print help information about parameters and should allow user to configure exe memory
    there is no document about hot to set --cores/-c in spark-shell
    
    and also
    
    users should be able to set executor memory through command line options
    
    In this PR I also check the format of the options passed by the user

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-spark spark_shell_improve

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #599
    
----
commit 2039fd77cd1e70ca7261c5da5fcf5340f28069e0
Author: CodingCat <zh...@gmail.com>
Date:   2014-02-14T02:03:34Z

    improvement on spark_shell (help information, configure memory)

----


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9781252
  
    nit: maybe "the maximum number of cores to be used by the spark shell"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9767634
  
    This is what I am referring to: https://github.com/CodingCat/incubator-spark/blob/spark_shell_improve/bin/spark-class#L93


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780299
  
    I thought that "Created spark context" is something signalling a significant step in starting spark-shell, we'd better write it to the log file to facilitate debugging 
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35049082
  
    Can one of the admins verify this patch?


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35135807
  
    added a new parameter to set the memory used by spark-shell driver


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35331962
  
    Thanks! Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35227134
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780199
  
    en....maybe --execmem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780400
  
    fixed the above two


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9781247
  
    update for -em


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9781248
  
    change OPTIONS to SPARK_SHELL_OPTS!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35227135
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35227969
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12736/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9781246
  
    pattern should start with ^ and end with $ -- just tried with something like "4gz" and it passed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780203
  
    A slightly confusing point is that the driver is in the same memory space as the shell, and we're really just controlling the memory of the shell process itself. I think the wording "the memory used by the spark shell and driver" may be slightly clearer in this sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35139397
  
    That is ideal, but as Patrick mentioned in #602, we cannot do this because users currently depend on the behavior of SPARK_MEM. Thus we must deprecate it and encourage users not to use it, but still support it for existing users.
    
    This is why we have suggested creating new environment variables for setting the shell memory (and example memory) to avoid changing the executor memory without changing past behavior.


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780214
  
    Most of the rest of the code in this file uses echo, so I would avoid changing this unless you have a good reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9768064
  
    OK, I got it


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-spark/pull/599


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35225862
  
    @aarondav thank you for the comments, another round of fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35226986
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9781258
  
    update or remove comment at the top of this file that talks about the options -- removal is fine since we have the list in code now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780202
  
    still ugly...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35227968
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#issuecomment-35138442
  
    @aarondav  I totally agree that we should deprecate SPARK_MEM, since that PR is still in progress (seems dead), I think we should avoid reading this variable in SparkContext to set executor memory. Otherwise the user may set this variable to limit memory usage of driver but affect the executor memory unintentionally  
    
    I created a PR to remove that line https://github.com/apache/incubator-spark/pull/602


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9780175
  
    It is perhaps unfortunate that SPARK_WORKER_MEMORY actually does exist, and controls the total amount of memory that a worker can lease across all executors on a machine. Since this is not the property we want, perhaps we'll be relegated to "-em --executormem" (kinda ugly, maybe there's something nicer-looking).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9767096
  
    Right now the only way to specify the spark-shell memory is to use the now-undocumented SPARK_MEM environment variable. I have raised this issue on the JIRA ticket that deprecated SPARK_MEM (https://spark-project.atlassian.net/browse/SPARK-929), and the resolution there may affect how we want to proceed here.
    
    (As an intermediate solution, you can always just add an option which sets SPARK_MEM, and we can change that later to something like SPARK_DRIVER_MEM when we add it.)


[GitHub] incubator-spark pull request: [SPARK-1090] improvement on spark_sh...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/incubator-spark/pull/599#discussion_r9767385
  
    Hi, @aarondav, just a bit confused. from the code
    
    private[spark] val executorMemory = conf.getOption("spark.executor.memory")
        .orElse(Option(System.getenv("SPARK_MEM")))
        .map(Utils.memoryStringToMb)
        .getOrElse(512)
    
    SPARK_MEM is to set the memory used by the executor, which has been done in this PR, 
    
    what you are proposing is to control the memory used by the driver. I think it is hard to achieve since users may run spark-shell in a machine out of the control of Spark.