You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/25 18:42:28 UTC

[GitHub] [beam] anilaluruintuitivecloud opened a new issue, #22894: [Feature Request]: Continuous read mode in Python SDK

anilaluruintuitivecloud opened a new issue, #22894:
URL: https://github.com/apache/beam/issues/22894

   ### What would you like to happen?
   
   Continuous read mode
   
   - Storing processed filenames in an external file and deduplicating the lists at the next transform
   
   - Adding timestamps to filenames, writing a glob pattern to pull in only new files, and matching the pattern when the pipeline restarts.
   
   Right now it's only available for Java SDK and that feature needs to extend to Python SDK.
   
   
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-py-files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [beam] Abacn commented on issue #22894: [Feature Request]: Continuous read mode in Python SDK

Posted by GitBox <gi...@apache.org>.

Abacn commented on issue #22894:
URL: https://github.com/apache/beam/issues/22894#issuecomment-1297350462

   In Python there is also https://beam.apache.org/releases/pydoc/current/apache_beam.io.fileio.html#apache_beam.io.fileio.MatchContinuously now. Does that satisfy your requirements?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org