You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/14 02:06:34 UTC

[GitHub] HeartSaVioR commented on issue #23782: [SPARK-26875][SQL] Add an option on FileStreamSource to include modified files

HeartSaVioR commented on issue #23782: [SPARK-26875][SQL] Add an option on FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782#issuecomment-463456066
 
 
   First of all, I agree this would be one of valid use cases.
   
   I'm just thinking out loud about edge-case (maybe that's why Spark restricts): when timestamp of file is modified in any chance (contents being added, some unintended modification, etc.), all of contents in file are reprocessed (as UT in this patch leverages it) which is not only breaking `end-to-end exactly-once` but also breaking `stateful exactly-once` because state will not be rolled back. So the option would fall into "at-least-once" semantic for such case which end users would expect at least stateful exactly-once. It needs to be warned.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org