You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kazuaki Ishizaki (JIRA)" <ji...@apache.org> on 2018/07/23 02:58:00 UTC
[jira] [Commented] (SPARK-24841) Memory leak in converting spark
dataframe to pandas dataframe
[ https://issues.apache.org/jira/browse/SPARK-24841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552231#comment-16552231 ]
Kazuaki Ishizaki commented on SPARK-24841:
------------------------------------------
Thank you for reporting an issue with heap profiling. Would it be possible to post a standalone program that can reproduce this problem?
> Memory leak in converting spark dataframe to pandas dataframe
> -------------------------------------------------------------
>
> Key: SPARK-24841
> URL: https://issues.apache.org/jira/browse/SPARK-24841
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.3.0
> Environment: Running PySpark in standalone mode
> Reporter: Piyush Seth
> Priority: Minor
>
> I am running a continuous running application using PySpark. In one of the operations I have to convert PySpark data frame to Pandas data frame using toPandas API on pyspark driver. After running for a while I am getting "java.lang.OutOfMemoryError: GC overhead limit exceeded" error.
> I tried running this in a loop and could see that the heap memory is increasing continuously. When I ran jmap for the first time I had the following top rows:
> num #instances #bytes class name
> ----------------------------------------------
> 1: 1757 411477568 [J
> {color:#FF0000} *2: 124188 266323152 [C*{color}
> 3: 167219 46821320 org.apache.spark.status.TaskDataWrapper
> 4: 69683 27159536 [B
> 5: 359278 8622672 java.lang.Long
> 6: 221808 7097856 java.util.concurrent.ConcurrentHashMap$Node
> 7: 283771 6810504 scala.collection.immutable.$colon$colon
> After running several iterations I had the following
> num #instances #bytes class name
> ----------------------------------------------
> {color:#FF0000} *1: 110760 3439887928 [C*{color}
> 2: 698 411429088 [J
> 3: 238096 66666880 org.apache.spark.status.TaskDataWrapper
> 4: 68819 24050520 [B
> 5: 498308 11959392 java.lang.Long
> 6: 292741 9367712 java.util.concurrent.ConcurrentHashMap$Node
> 7: 282878 6789072 scala.collection.immutable.$colon$colon
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org