You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/14 01:35:23 UTC

[GitHub] mikedias opened a new pull request #23782: [SPARK-26875][SQL] Add an option on FileStreamSource to include modified files

mikedias opened a new pull request #23782: [SPARK-26875][SQL] Add an option on FileStreamSource to include modified files
URL: https://github.com/apache/spark/pull/23782
 
 
   ## What changes were proposed in this pull request?
   
   The current behavior only the check the filename to determine if a file should be processed or not. I propose to add an option to also test the file timestamp if is greater than last time it was processed, as an indication that it's modified and have different content. 
   
   It is useful when the source producer eventually overrides files with new content.
   
   ## How was this patch tested?
   
   Added unit tests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org