You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/06/09 23:25:00 UTC

[jira] [Updated] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

     [ https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-26679:
----------------------------------
    Affects Version/s:     (was: 2.4.0)
                       3.0.0

> Deconflict spark.executor.pyspark.memory and spark.python.worker.memory
> -----------------------------------------------------------------------
>
>                 Key: SPARK-26679
>                 URL: https://issues.apache.org/jira/browse/SPARK-26679
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Ryan Blue
>            Priority: Major
>
> In 2.4.0, spark.executor.pyspark.memory was added to limit the total memory space of a python worker. There is another RDD setting, spark.python.worker.memory that controls when Spark decides to spill data to disk. These are currently similar, but not related to one another.
> PySpark should probably use spark.executor.pyspark.memory to limit or default the setting of spark.python.worker.memory because the latter property controls spilling and should be lower than the total memory limit. Renaming spark.python.worker.memory would also help clarity because it sounds like it should control the limit, but is more like the JVM setting spark.memory.fraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org