You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 16:14:46 UTC

[GitHub] [beam] kennknowles opened a new issue, #18054: Per-step, per-execution nonce

kennknowles opened a new issue, #18054:
URL: https://github.com/apache/beam/issues/18054

   In the forthcoming runner API, a user will be able to save a pipeline to JSON and then run it repeatedly.
   
   Many pieces of code (e.g., BigQueryIO.Read or Write) rely on a single random value (nonce). These values are typically generated at apply time, so that they are deterministic (don't change across retries of DoFns) and global (are the same across all workers).
   
   However, once the runner API lands the existing code would result in the same nonce being reused across jobs. Other possible solutions:
   
   * Generate nonce in `Create(1) | ParDo` then use this as a side input. Should work, as along as side inputs are actually checkpointed. But does not work for `BoundedSource`.
   
   * If a nonce is only needed for the lifetime of one bundle, can be generated in `startBundle` and used in `finishBundle` [or `tearDown`].
   
   * Add some context somewhere that lets user code access unique step name, and somehow generate a nonce consistently e.g. by hashing. Will usually work, but this is similarly not available to sources.
   
   Another Q: I'm not sure we have a good way to generate nonces in unbounded pipelines \-- we probably need one. This would enable us to, e.g., use `BigQueryIO.Write` in an unbounded pipeline [if we had, e.g., exactly-once triggering per window]. Or generalizing to multiple firings...
   
   Imported from Jira [BEAM-758](https://issues.apache.org/jira/browse/BEAM-758). Original Jira may contain additional context.
   Reported by: dhalperi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org