You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jem Tucker (JIRA)" <ji...@apache.org> on 2015/01/13 17:06:34 UTC

[jira] [Created] (SPARK-5221) FileInputDStream "remember window" in certain situations causes files to be ignored

Jem Tucker created SPARK-5221:
---------------------------------

             Summary: FileInputDStream "remember window" in certain situations causes files to be ignored 
                 Key: SPARK-5221
                 URL: https://issues.apache.org/jira/browse/SPARK-5221
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.2.0, 1.1.1
            Reporter: Jem Tucker
            Priority: Minor


When batch times are greater than 1 minute, if a file begins to be moved into a directory just before FileInputDStream.findNewFiles() is called but does not become visible untill after it has excecuted and therefore is not included in that batch, the file is then ignored in the following batch as its mod time is less than the modTimeIgnoreThreshold. This causes data to be ignored in spark streaming that shouldnt be, especially when large files are being moved into the directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org