You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/18 21:53:05 UTC

[GitHub] [spark] HeartSaVioR edited a comment on issue #24128: [SPARK-27188][SS] FileStreamSink: provide a new option to disable metadata log

HeartSaVioR edited a comment on issue #24128: [SPARK-27188][SS] FileStreamSink: provide a new option to disable metadata log
URL: https://github.com/apache/spark/pull/24128#issuecomment-474115781
 
 
   > I'm not comfortable adding an option to just turn it off; there are all sorts of ways that could cause more subtle issues than at-least-once semantics.
   
   I totally understand about uncomfortable of disabling the metadata, but as I described in JIRA issue and description of PR there's no workaround except letting end users deal with dirty thing by their hands. I'd give it another try to let FileStreamSink checks deleted output files in background and exclude when compacting metadata (I guess it's ideal one to go), but that definitely brings overhead and maybe some configurations as well.
   
   Regarding subtle issues it would be better for us to share possible issues (instead of 'something might happen') if we can imagine any: it would help to lead our direction to the right way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org