You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ivoson <gi...@git.apache.org> on 2018/08/28 08:46:08 UTC

[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

GitHub user ivoson opened a pull request:

    https://github.com/apache/spark/pull/22252

    [SPARK-25261][MINOR][DOC] correct the default unit for spark.executor|driver.memory as described in configuration.md

    ## What changes were proposed in this pull request?
    
    As described in [SPARK-25261](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-25261),the unit of spark.executor.memory and spark.driver.memory is bytes if no unit specified, while in https://spark.apache.org/docs/latest/configuration.html#application-properties, they are descibed as MiB, which may lead to some misunderstandings.
    
    ## How was this patch tested?
    
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivoson/spark branch-correct-configuration

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22252
    
----
commit 8f360b97e9f0afff7e6627df15cc17f103a5c2b3
Author: Huang Tengfei <te...@...>
Date:   2018-08-28T07:28:06Z

    update configuration.md, correct the default unit of spark.executor.memory and spark.driver.memory.
    
    Change-Id: I2935b93559aa3e016bbab7a083c8c24bbdc6f685

commit 3eb3b66a52435366f258cbb40d01d8f3b0141bff
Author: huangtengfei02 <hu...@...>
Date:   2018-08-28T08:29:25Z

    fix style issue
    
    Change-Id: I73e82b8bd07064d874bf76df52cb661097532884

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213783925
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    I took a look at the history of this code and the only constant here is how all places are really confused about the units over time.
    
    It seems to me that things shifted back and forth a bit, and that now the behavior actually differs depending on cluster manager: YARN and k8s default to Mb, while others default to bytes.
    
    I think it's just safer to not say what the default unit is and encourage people to explicitly identify it; I think that's what's being done for a lot of new time- and size-based configs. The "no unit" approach was always meant to be only for backwards compatibility anyway.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by ivoson <gi...@git.apache.org>.
Github user ivoson commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213546414
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    If we start a job with conf spark.executor.memory=1024 or spark.driver.memory=1024, it means 1024 bytes for now. So I think we should update the doc to avoid confusion for spark users.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213522085
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    why MiB is wrong btw?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213774632
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    Oh, hm.
    
    ```
    $ ./bin/spark-shell --master local --driver-memory=1024
    Error occurred during initialization of VM
    Too small initial heap
    $ ./bin/spark-shell --master local --conf spark.driver.memory=1024
    Error occurred during initialization of VM
    Too small initial heap
    ```
    
    Although the option referenced above is used, it's used in YARN and K8S, not in this code path.
    
    Paging @vanzin for an opinion? I'd prefer to change docs rather than behavior but it looks like we might have divergent behavior here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213524321
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    MiB is correct in both cases, according to the code. I believe this PR should be closed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by ivoson <gi...@git.apache.org>.
Github user ivoson commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213544524
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    @xuanyuanking @HyukjinKwon @srowen thanks for your reply. I've also noticed the code above, but the 'DRIVER_MEMORY' and 'EXECUTOR_MEMORY' in the config/package.scala never used, maybe this is for future usage I think.  The code below shows how the conf is used for now, please take a look.
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L465
    
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1130
    
    https://github.com/ivoson/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L265


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] correct the default unit for s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] update the description ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22252


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213343952
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    Check the config code here.
    https://github.com/apache/spark/blob/99d2e4e00711cffbfaee8cb3da9b6b3feab8ff18/core/src/main/scala/org/apache/spark/internal/config/package.scala#L40-L43


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] correct the default unit for s...

Posted by ivoson <gi...@git.apache.org>.
Github user ivoson commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    Based on the discussion above, I think updating the doc to guide users to explicitly identify the unit may be necessary. The last commit update the doc. cc @vanzin @srowen @xuanyuanking 
    
    Also I think the unit for the conf should be finally unified because even for yarn-cluster or K8S,  spark.executor.memory is not just used to allocate resource where parsed as MiB in default, while some other places parsed as bytes  in default for some validation, such as:
    https://github.com/apache/spark/blob/ff8dcc1d4c684e1b68e63d61b3f20284b9979cca/core/src/main/scala/org/apache/spark/SparkContext.scala#L465-L470
    https://github.com/apache/spark/blob/ff8dcc1d4c684e1b68e63d61b3f20284b9979cca/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala#L208-L233
    https://github.com/apache/spark/blob/ff8dcc1d4c684e1b68e63d61b3f20284b9979cca/core/src/main/scala/org/apache/spark/memory/StaticMemoryManager.scala#L121-L143
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] correct the default unit for s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] update the description for spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    **[Test build #4306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4306/testReport)** for PR 22252 at commit [`294ee6d`](https://github.com/apache/spark/commit/294ee6d630f1758f78f6496317e744455746527f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] update the description for spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    The doc change is OK. I would leave the JIRA open to track changing the behavior of these units to be consistent, but that may need to wait until Spark 3.0.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] correct the default unit for s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] update the description for spa...

Posted by ivoson <gi...@git.apache.org>.
Github user ivoson commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    Thanks and got it. @srowen 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22252: [SPARK-25261][MINOR][DOC] correct the default uni...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22252#discussion_r213708177
  
    --- Diff: docs/configuration.md ---
    @@ -152,7 +152,7 @@ of the most common options to set are:
       <td><code>spark.driver.memory</code></td>
       <td>1g</td>
       <td>
    -    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in MiB 
    +    Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in bytes 
    --- End diff --
    
    I think I got the point you want to report @ivoson, IIUC, this is a bug in the code not in doc, we should also make `spark.driver.memory=1024` with the unit of MiB, maybe this change the original behavior and we can announce in migrate guide? cc @srowen @HyukjinKwon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] update the description for spa...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    Merged to master but leaving the JIRA open to track standardizing the parsing of values in Spark 3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22252: [SPARK-25261][MINOR][DOC] update the description for spa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22252
  
    **[Test build #4306 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4306/testReport)** for PR 22252 at commit [`294ee6d`](https://github.com/apache/spark/commit/294ee6d630f1758f78f6496317e744455746527f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org