You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2018/12/10 17:13:00 UTC

[jira] [Comment Edited] (SPARK-26302) retainedBatches configuration can cause memory leak

    [ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715124#comment-16715124 ] 

Gabor Somogyi edited comment on SPARK-26302 at 12/10/18 5:12 PM:
-----------------------------------------------------------------

{quote}can cause memory leak{quote}
Is it really memory leak and not slow processing or out of memory?



was (Author: gsomogyi):
> can cause memory leak
Is it really memory leak and not slow processing or out of memory?


> retainedBatches configuration can cause memory leak
> ---------------------------------------------------
>
>                 Key: SPARK-26302
>                 URL: https://issues.apache.org/jira/browse/SPARK-26302
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, DStreams
>    Affects Versions: 2.4.0
>            Reporter: Behroz Sikander
>            Priority: Minor
>         Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard way.
> The size of a single BatchUIData is around 750KB. Increasing this value to something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has frequent GCs and has long scheduling days. Once the heap is full, the job cannot be recovered.
> A note of caution should be added to the documentation to let users know the impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org