You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/15 01:07:26 UTC

[GitHub] HeartSaVioR commented on issue #23782: [SPARK-26875][SS] Add an option on FileStreamSource to include modified files

HeartSaVioR commented on issue #23782: [SPARK-26875][SS] Add an option on FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782#issuecomment-463866912
 
 
   Maybe @gaborgsomogyi is considering more than what it is as of now. Currently #22952 is implemented as synchronous approach, but @gaborgsomogyi had his voice to make archiving/deletion being async in #22952 (we talked about dealing with it in next TODO), and it's based on the assumption that we never process the file again even same file is added as new.
   
   When the option is turned on, it breaks assumption and things will get changed very differently - including race condition what @gaborgsomogyi is stated. When we hit by race condition It would produce unintended result (maybe worse scenario would be archive/delete a new file before processing).
   
   Not sure race condition occurs even archiving/deleting is synchronous.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org