You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/03/18 10:22:00 UTC

[jira] [Assigned] (SPARK-27188) FileStreamSink: provide a new option to disable metadata log

     [ https://issues.apache.org/jira/browse/SPARK-27188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-27188:
------------------------------------

    Assignee: Apache Spark

> FileStreamSink: provide a new option to disable metadata log
> ------------------------------------------------------------
>
>                 Key: SPARK-27188
>                 URL: https://issues.apache.org/jira/browse/SPARK-27188
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Apache Spark
>            Priority: Major
>
> From SPARK-24295 we indicated various end users are struggling with dealing with huge FileStreamSink metadata log. Unfortunately, given we have arbitrary readers which leverage metadata log to determine which files are safely read (to ensure 'exactly-once'), pruning metadata log is not trivial to implement.
> While we may be able to deal with checking deleted output files in FileStreamSink and get rid of them when compacting metadata, that operation would take additional overhead for running query. (I'll try to address this via another issue though.)
> Back to the issue, 'exactly-once' via leveraging metadata is only possible when output directory is being read by Spark, and for other cases it should provide less guarantee. I think we could provide this as a workaround to mitigate such issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org