You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jem Tucker (JIRA)" <ji...@apache.org> on 2015/01/13 17:06:34 UTC
[jira] [Created] (SPARK-5221) FileInputDStream "remember window" in
certain situations causes files to be ignored
Jem Tucker created SPARK-5221:
---------------------------------
Summary: FileInputDStream "remember window" in certain situations causes files to be ignored
Key: SPARK-5221
URL: https://issues.apache.org/jira/browse/SPARK-5221
Project: Spark
Issue Type: Bug
Components: Streaming
Affects Versions: 1.2.0, 1.1.1
Reporter: Jem Tucker
Priority: Minor
When batch times are greater than 1 minute, if a file begins to be moved into a directory just before FileInputDStream.findNewFiles() is called but does not become visible untill after it has excecuted and therefore is not included in that batch, the file is then ignored in the following batch as its mod time is less than the modTimeIgnoreThreshold. This causes data to be ignored in spark streaming that shouldnt be, especially when large files are being moved into the directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org