You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/16 04:26:48 UTC

[GitHub] [spark] HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md

HeartSaVioR commented on a change in pull request #27398: [SPARK-30481][DOCS][FOLLOWUP] Document event log compaction into new section of monitoring.md
URL: https://github.com/apache/spark/pull/27398#discussion_r379874393
 
 

 ##########
 File path: docs/monitoring.md
 ##########
 @@ -95,6 +95,44 @@ The history server can be configured as follows:
   </tr>
 </table>
 
+### Applying compaction of old event log files
+
+A long-running streaming application can bring a huge single event log file which may cost a lot to maintain and
+also requires a bunch of resource to replay per each update in Spark History Server.
+
+Enabling <code>spark.eventLog.rolling.enabled</code> and <code>spark.eventLog.rolling.maxFileSize</code> would
+let you have multiple event log files instead of single huge event log file which may help some scenarios on its own,
+but it still doesn't help you reducing the overall size of logs.
+
+Spark History Server can apply 'compaction' on the rolling event log files to reduce the overall size of
+logs, via setting the configuration <code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> on the
+Spark History Server.
+
+When the compaction happens, History Server lists all the available event log files, and considers the event log files older than
+retained log files as a target of compaction. For example, if the application A has 5 event log files and
+<code>spark.history.fs.eventLog.rolling.maxFilesToRetain</code> is set to 2, first 3 log files will be selected to be compacted.
+
+Once it selects the files, it analyzes these files to figure out which events can be excluded, and rewrites these files
+into one compact file with discarding some events. Once rewriting is done, original log files will be deleted.
 
 Review comment:
   Yeah it wouldn't matter for the logic as listing event log files would take the "last" compact file, and the right side of event log files. And I'd agree to worth to mention the deletion is best effort.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org