You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Genmao Yu (JIRA)" <ji...@apache.org> on 2019/05/14 09:28:00 UTC

[jira] [Commented] (SPARK-26302) retainedBatches configuration can eat up memory on driver

    [ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839253#comment-16839253 ] 

Genmao Yu commented on SPARK-26302:
-----------------------------------

Add some warning in documentation is reasonable.

> retainedBatches configuration can eat up memory on driver
> ---------------------------------------------------------
>
>                 Key: SPARK-26302
>                 URL: https://issues.apache.org/jira/browse/SPARK-26302
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, DStreams
>    Affects Versions: 2.4.0
>            Reporter: Behroz Sikander
>            Priority: Minor
>         Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard way.
> The size of a single BatchUIData is around 750KB. Increasing this value to something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has frequent GCs and has long scheduling days. Once the heap is full, the job cannot be recovered.
> A note of caution should be added to the documentation to let users know the impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org