You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Yang (JIRA)" <ji...@apache.org> on 2019/03/04 09:03:00 UTC
[jira] [Created] (SPARK-27041) large partition data cause pyspark
with python2.x oom
David Yang created SPARK-27041:
----------------------------------
Summary: large partition data cause pyspark with python2.x oom
Key: SPARK-27041
URL: https://issues.apache.org/jira/browse/SPARK-27041
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.4.0
Reporter: David Yang
With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7.
This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory.
The proposed fix will use imap in python 2.7 and it has been verified.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org