You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Piyush Seth (JIRA)" <ji...@apache.org> on 2018/07/18 07:01:00 UTC
[jira] [Created] (SPARK-24841) Memory leak in converting spark
dataframe to pandas dataframe
Piyush Seth created SPARK-24841:
-----------------------------------
Summary: Memory leak in converting spark dataframe to pandas dataframe
Key: SPARK-24841
URL: https://issues.apache.org/jira/browse/SPARK-24841
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.3.0
Environment: Running PySpark in standalone mode
Reporter: Piyush Seth
I am running a continuous running application using PySpark. In one of the operations I have to convert PySpark data frame to Pandas data frame using toPandas API on pyspark driver. After running for a while I am getting "java.lang.OutOfMemoryError: GC overhead limit exceeded" error.
I tried running this in a loop and could see that the heap memory is increasing continuously. When I ran jmap for the first time I had the following top rows:
num #instances #bytes class name
----------------------------------------------
1: 1757 411477568 [J
{color:#FF0000} *2: 124188 266323152 [C*{color}
3: 167219 46821320 org.apache.spark.status.TaskDataWrapper
4: 69683 27159536 [B
5: 359278 8622672 java.lang.Long
6: 221808 7097856 java.util.concurrent.ConcurrentHashMap$Node
7: 283771 6810504 scala.collection.immutable.$colon$colon
After running several iterations I had the following
num #instances #bytes class name
----------------------------------------------
{color:#FF0000} *1: 110760 3439887928 [C*{color}
2: 698 411429088 [J
3: 238096 66666880 org.apache.spark.status.TaskDataWrapper
4: 68819 24050520 [B
5: 498308 11959392 java.lang.Long
6: 292741 9367712 java.util.concurrent.ConcurrentHashMap$Node
7: 282878 6789072 scala.collection.immutable.$colon$colon
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org