You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Burak Yavuz (JIRA)" <ji...@apache.org> on 2015/10/30 07:18:27 UTC
[jira] [Created] (SPARK-11419) WriteAheadLog recovery improvements
for when closeFileAfterWrite is enabled
Burak Yavuz created SPARK-11419:
-----------------------------------
Summary: WriteAheadLog recovery improvements for when closeFileAfterWrite is enabled
Key: SPARK-11419
URL: https://issues.apache.org/jira/browse/SPARK-11419
Project: Spark
Issue Type: Improvement
Components: Streaming
Reporter: Burak Yavuz
The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway.
However, when you have many small files on S3, recovery takes very long. We can parallelize the recovery process.
In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore we should add support for that as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org