You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xeto (JIRA)" <ji...@apache.org> on 2016/09/02 14:24:20 UTC

[jira] [Comment Edited] (SPARK-17380) Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)

    [ https://issues.apache.org/jira/browse/SPARK-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458673#comment-15458673 ] 

Xeto edited comment on SPARK-17380 at 9/2/16 2:24 PM:
------------------------------------------------------

Attaching Ganglia post freeze: used memory after Spark has frozen - is finally reduced but doesn't help Spark to recover


was (Author: xeto):
Used memory after Spark has frozen - is finally reduced but doesn't help Spark to recover

> Spark streaming with a multi shard Kinesis freezes after several days (memory/resource leak?)
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17380
>                 URL: https://issues.apache.org/jira/browse/SPARK-17380
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 2.0.0
>            Reporter: Xeto
>         Attachments: memory-after-freeze.png, memory.png
>
>
> Running Spark Streaming 2.0.0 on AWS EMR 5.0.0 consuming from Kinesis (125 shards).
> Used memory keeps growing all the time according to Ganglia.
> The application works properly for about 3.5 days till all free memory has been used.
> Then, micro batches start queuing up but none is served.
> Spark freezes. You can see in Ganglia that some memory is being freed but it doesn't help the job to recover.
> Is it a memory/resource leak?
> The job uses back pressure and Kryo.
> The code has a mapToPair(), groupByKey(),  flatMap(), persist(StorageLevel.MEMORY_AND_DISK_SER_2()) and repartition(19); Then storing to s3 using foreachRDD()
> Cluster size: 20 machines
> Spark cofiguration:
> spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.driver.extraJavaOptions -Dspark.driver.log.level=INFO -XX:+UseConcMarkSweepGC -XX:PermSize=256M -XX:MaxPermSize=256M -XX:OnOutOfMemoryError='kill -9 %p' 
> spark.master yarn-cluster
> spark.executor.instances 19
> spark.executor.cores 7
> spark.executor.memory 7500M
> spark.driver.memory 7500M
> spark.default.parallelism 133
> spark.yarn.executor.memoryOverhead 2950
> spark.yarn.driver.memoryOverhead 2950
> spark.eventLog.enabled false
> spark.eventLog.dir hdfs:///spark-logs/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org