You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mike Dias (JIRA)" <ji...@apache.org> on 2019/02/14 01:13:00 UTC

[jira] [Created] (SPARK-26875) Add an option on FileStreamSource for include modified files

Mike Dias created SPARK-26875:
---------------------------------

             Summary: Add an option on FileStreamSource for include modified files 
                 Key: SPARK-26875
                 URL: https://issues.apache.org/jira/browse/SPARK-26875
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Mike Dias


The current behavior only the check the filename to determine if a file should be processed or not. I propose to add an option to also test the file timestamp if is greater than last time it was processed, as an indication that it's modified and have different content. 

It is useful when the source producer eventually overrides files with new content.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org