You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mike Dias (JIRA)" <ji...@apache.org> on 2019/02/14 01:13:00 UTC
[jira] [Created] (SPARK-26875) Add an option on FileStreamSource
for include modified files
Mike Dias created SPARK-26875:
---------------------------------
Summary: Add an option on FileStreamSource for include modified files
Key: SPARK-26875
URL: https://issues.apache.org/jira/browse/SPARK-26875
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 2.4.0
Reporter: Mike Dias
The current behavior only the check the filename to determine if a file should be processed or not. I propose to add an option to also test the file timestamp if is greater than last time it was processed, as an indication that it's modified and have different content.
It is useful when the source producer eventually overrides files with new content.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org