You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Burak Yavuz (JIRA)" <ji...@apache.org> on 2015/10/30 07:18:27 UTC

[jira] [Created] (SPARK-11419) WriteAheadLog recovery improvements for when closeFileAfterWrite is enabled

Burak Yavuz created SPARK-11419:
-----------------------------------

             Summary: WriteAheadLog recovery improvements for when closeFileAfterWrite is enabled
                 Key: SPARK-11419
                 URL: https://issues.apache.org/jira/browse/SPARK-11419
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
            Reporter: Burak Yavuz


The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway.

However, when you have many small files on S3, recovery takes very long. We can parallelize the recovery process.

In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore we should add support for that as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org