You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 00:22:56 UTC

[GitHub] [beam] damccorm opened a new issue, #21570: Update watchForNewFiles to allow reading already read files with a new timestamp

damccorm opened a new issue, #21570:
URL: https://github.com/apache/beam/issues/21570

   In TextIO and AvroIO, we have a configuration option called watchForNewFiles, and in FileIO.MatchConfiguration, we have an option called watchInterval. Right now, these match any files according to the filtering criteria, and then periodically check for new files. A file is determined to be new if it has a different filename than a file that has already been read.
   
   We want to add an option to choose to consider a file new if it has a different timestamp from an existing file, even if the file itself has the same name.
   
   See the following design doc for more detail:
   
   [https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/edit?usp=sharing&resourcekey=0-be0uF-DdmwAz6Vg4Li9FNw](https://docs.google.com/document/d/1xnacyLGNh6rbPGgTAh5D1gZVR8rHUBsMMRV3YkvlL08/edit?usp=sharing&resourcekey=0-be0uF-DdmwAz6Vg4Li9FNw)
   
    
   
   Imported from Jira [BEAM-14267](https://issues.apache.org/jira/browse/BEAM-14267). Original Jira may contain additional context.
   Reported by: yihu.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #21570: Update watchForNewFiles to allow reading already read files with a new timestamp

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #21570:
URL: https://github.com/apache/beam/issues/21570#issuecomment-1155458210

   Addressed in #17305
   
   .close-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed issue #21570: Update watchForNewFiles to allow reading already read files with a new timestamp

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #21570: Update watchForNewFiles to allow reading already read files with a new timestamp
URL: https://github.com/apache/beam/issues/21570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on issue #21570: Update watchForNewFiles to allow reading already read files with a new timestamp

Posted by GitBox <gi...@apache.org>.
damccorm commented on issue #21570:
URL: https://github.com/apache/beam/issues/21570#issuecomment-1146710658

   Unable to assign user @Abacn. If able, self-assign, otherwise tag @damccorm so that he can assign you. Because of GitHub's spam prevention system, your activity is required to enable assignment in this repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org