You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 15:28:01 UTC

[GitHub] [beam] damccorm opened a new issue, #20106: DataflowPipelineJob.waitUntilFinish() crashes when it has created a template.

damccorm opened a new issue, #20106:
URL: https://github.com/apache/beam/issues/20106

   ```
   
   INFO: Template successfully created.
   
   Exception in thread "main" java.lang.UnsupportedOperationException:
   The result of template creation should not be used.
           at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getJobId(DataflowTemplateJob.java:37)
   
          at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:524)
   
          at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:506)
   
          at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:295)
   
          at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:224)
   
          at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:183)
   
          at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:176)
   
   ```
   
   
   This is a real error. If a template was created, the job is complete. Instead of crashing by tried to access the job id, as though `DataflowPipelineJob` doesn't know it made a template, it should instead return successfully. Or perhaps there is another design choice. But just crashes does not make sense. Probably `DataflowRunner` should not return a `DataflowPipelineJob` at all in this way.
   
   Imported from Jira [BEAM-9337](https://issues.apache.org/jira/browse/BEAM-9337). Original Jira may contain additional context.
   Reported by: kenn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] DataflowPipelineJob.waitUntilFinish() crashes when it has created a template. [beam]

Posted by "CodingAnarchy (via GitHub)" <gi...@apache.org>.
CodingAnarchy commented on issue #20106:
URL: https://github.com/apache/beam/issues/20106#issuecomment-1756605162

   We recently ran into this error when trying to do some after batch cleanup in some of our GCP Dataflow jobs when using a template, and had to revert those changes. Not being able to use `waitUntilFinish()` to defer cleanup of some of the intermediate resources and do some post pipeline work makes it a bit more difficult to manage these pipelines.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] DataflowPipelineJob.waitUntilFinish() crashes when it has created a template. [beam]

Posted by "djaneluz (via GitHub)" <gi...@apache.org>.
djaneluz commented on issue #20106:
URL: https://github.com/apache/beam/issues/20106#issuecomment-1834378753

   Just faced the same problem with BEAM version 2.51.0, any updates?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] DataflowPipelineJob.waitUntilFinish() crashes when it has created a template. [beam]

Posted by "RonBarkan (via GitHub)" <gi...@apache.org>.
RonBarkan commented on issue #20106:
URL: https://github.com/apache/beam/issues/20106#issuecomment-1872327065

   Running a pipeline, then running another pipeline or even just some bit of non-pipeline code, when the first is finished, is not possible with this behavior.
   Why is this important / valid use case:
   1. Many pipelines end in writing some output somewhere as a terminal state, so you cannot chain another step past it. However, if you want to do something, such as write a marker file, after all output has completed, you must do it after the pipeline ends.
   You can only do this if you `waitUntilFinish()`.
   2. If you collect metrics from the run to store them or process them in some way, you probably want to:
   ```
       var result = pipeline.run();
       result.waitUntilFinish();
       processMetrics(result.metrics());
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org