You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2016/07/11 19:53:11 UTC

[jira] [Closed] (FLINK-3515) Make the "file monitoring source" exactly-once

     [ https://issues.apache.org/jira/browse/FLINK-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aljoscha Krettek closed FLINK-3515.
-----------------------------------
    Resolution: Duplicate

> Make the "file monitoring source" exactly-once
> ----------------------------------------------
>
>                 Key: FLINK-3515
>                 URL: https://issues.apache.org/jira/browse/FLINK-3515
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.10.2
>            Reporter: Stephan Ewen
>
> The stream source that watches directories for changes is currently not "exactly-once".
> To make it exactly once, the source (that generates files to be read) and the flatMap (that reads the files) need to keep track of where they were at the point of a checkpoint.
> Assuming that files do not change after creation (HDFS / S3 style), we can make this the following way:
>   - The source can track the files it already emitted downstream via file creation/modification timestamp, assuming that new files always get newer timestamps.
>   - The flatMappers need to always store the path of their current file fragment, plus the byte offset where they were within that file split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)