You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 00:17:01 UTC

[GitHub] [beam] damccorm opened a new issue, #21522: Server-side Dataflow job idempotence

damccorm opened a new issue, #21522:
URL: https://github.com/apache/beam/issues/21522

   *Issue*: when a job submission is retried, it may result in duplicate Dataflow jobs. The Dataflow job `name` only guarantees uniqueness for _active_ jobs \-- that is, if a job with the same name exists but is already completed, the same `name` is allowed again. What we would like is job uniqueness regardless of job status.
   
   The Dataflow API provides a way to ensure unique jobs through the use of `clientRequestId`:
   ```
   
   The client's unique identifier of the job, re-used 
   across retried attempts. If this field is set,
   the service will ensure 
   its uniqueness. The request to create a job will fail if the service has
   
   knowledge of a previously submitted job with the same client's ID and 
   job name. The caller may use
   this field to ensure idempotence of job 
   creation across retried attempts to create a job. By default,
   the field 
   is empty and, in that case, the service ignores it. 
   ```
   
   [https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.jobs](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.jobs)
   
   In DataflowRunner.java, `clientRequestId` is set with [a randomized value](https://github.com/apache/beam/blob/v2.37.0/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1125).
   
   *Proposed solution*: provide the ability to pass in a `clientRequestId` through `DataflowPipelineOptions` and set it on the `Job` when available, otherwise default to the randomized value.
   
   Imported from Jira [BEAM-14284](https://issues.apache.org/jira/browse/BEAM-14284). Original Jira may contain additional context.
   Reported by: toltol.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org