You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saisai Shao (JIRA)" <ji...@apache.org> on 2016/09/20 07:56:20 UTC

[jira] [Created] (SPARK-17604) Support purging aged file entry for FileStreamSource metadata log

Saisai Shao created SPARK-17604:
-----------------------------------

             Summary: Support purging aged file entry for FileStreamSource metadata log
                 Key: SPARK-17604
                 URL: https://issues.apache.org/jira/browse/SPARK-17604
             Project: Spark
          Issue Type: Improvement
          Components: SQL, Streaming
            Reporter: Saisai Shao
            Priority: Minor


Currently with SPARK-15698, FileStreamSource metadata log will be compacted periodically (10 batches by default), this means compacted batch file will contain whole file entries been processed. With the time passed, the compacted batch file will be accumulated to a relative large file. 

With SPARK-17165, now {{FileStreamSource}} doesn't track the aged file entry, but in the log we still keep the full records,  this is not necessary and quite time-consuming during recovery. So here propose to also add file entry purging ability to {{FileStreamSource}} metadata log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org