You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2020/01/10 08:28:00 UTC

[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

    [ https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012562#comment-17012562 ] 

Jungtaek Lim commented on SPARK-28594:
--------------------------------------

I'm enumerating the items which are "good to do", which might be better to file JIRA issues once we decide we should do them, or all required functionalities are done and we have a resource to deal with them.

For now, the items what I have are below:
 * Retain specific number of jobs / executions which allows compact file to have some of finished jobs / executions
 ** [https://github.com/apache/spark/pull/27085#discussion_r363428336]
 * Separate compaction from cleaning to allow leaving some old event log files after compaction
 ** [https://github.com/apache/spark/pull/27085#issuecomment-572792067]
 * Cache the state of compactor to avoid replaying event log files previously loaded before
 ** [https://github.com/apache/spark/pull/26416#discussion_r358260674]

 

> Allow event logs for running streaming apps to be rolled over.
> --------------------------------------------------------------
>
>                 Key: SPARK-28594
>                 URL: https://issues.apache.org/jira/browse/SPARK-28594
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>         Environment: This has been reported on 2.0.2.22 but affects all currently available versions.
>            Reporter: Stephen Levett
>            Priority: Major
>
> At all current Spark releases when event logging on spark streaming is enabled the event logs grow massively.  The files continue to grow until the application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org