You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/05/12 03:19:00 UTC

[jira] [Work logged] (BEAM-14267) Update watchForNewFiles to allow reading already read files with a new timestamp

     [ https://issues.apache.org/jira/browse/BEAM-14267?focusedWorklogId=769408&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769408 ]

ASF GitHub Bot logged work on BEAM-14267:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/May/22 03:18
            Start Date: 12/May/22 03:18
    Worklog Time Spent: 10m 
      Work Description: Abacn commented on PR #17305:
URL: https://github.com/apache/beam/pull/17305#issuecomment-1124488694

   If finished review, this PR should be merged until #17604 (python counterpart of this feature) is also ready to go.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 769408)
    Time Spent: 3h 40m  (was: 3.5h)

> Update watchForNewFiles to allow reading already read files with a new timestamp
> --------------------------------------------------------------------------------
>
>                 Key: BEAM-14267
>                 URL: https://issues.apache.org/jira/browse/BEAM-14267
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-files
>            Reporter: Yi Hu
>            Assignee: Yi Hu
>            Priority: P2
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> In TextIO and AvroIO, we have a configuration option called watchForNewFiles, and in FileIO.MatchConfiguration, we have an option called watchInterval. Right now, these match any files according to the filtering criteria, and then periodically check for new files. A file is determined to be new if it has a different filename than a file that has already been read.
> We want to add an option to choose to consider a file new if it has a different timestamp from an existing file, even if the file itself has the same name.
> See the following design doc for more detail:
> [https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/edit?usp=sharing&resourcekey=0-be0uF-DdmwAz6Vg4Li9FNw]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)