You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Yang (JIRA)" <ji...@apache.org> on 2019/03/04 09:03:00 UTC

[jira] [Created] (SPARK-27041) large partition data cause pyspark with python2.x oom

David Yang created SPARK-27041:
----------------------------------

             Summary: large partition data cause pyspark with python2.x oom
                 Key: SPARK-27041
                 URL: https://issues.apache.org/jira/browse/SPARK-27041
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.0
            Reporter: David Yang


With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7.
This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory.

The proposed fix will use imap in python 2.7 and it has been verified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org