You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Randal Moore <rd...@gmail.com> on 2017/07/09 21:54:50 UTC

API to query the state of a running dataflow job?

Is this part of the Beam API or something I should look at the google docs
for help?  Assume a job is running in dataflow - how can an interested
third-party app query the status if it knows the job-id?

rdm

Re: API to query the state of a running dataflow job?

Posted by Randal Moore <rd...@gmail.com>.
Thanks.  I will create a JIRA ticket to try to explain. I am planning a
service running in kubernetes that will submit dataflow jobs.  It will need
to know the status of jobs [across service restarts]. Alternatives might be
to do some sort of GBK at the end of the job and post the result to
pub/sub.  That seemed complex - my last step is currently a
Datastore.write, which needed to be finished before claiming the job is
done, and DataStoreIO is a "termination" right?



On Sun, Jul 9, 2017 at 10:04 PM Kenneth Knowles <kl...@google.com> wrote:

> (Speaking for Java, but I think Python is similar)
>
> There's nothing in the Beam API right now for querying a job unless you
> have a handle on the original object returned by the runner. The nature of
> the result of run() is particular to a runner, though it is easy to imagine
> a feature whereby you can "attach" to a known running job.
>
> So I think your best option is to use runner-specific APIs for now. For
> Dataflow that would be the cloud APIs [1]. You can see how it is done by
> the Beam wrapper DataflowPipelineJob [2] as a reference.
>
> Out of curiosity - what sort of third-party app? It would super if you
> could file a JIRA [3] describing your use case with some more details, to
> help gain visibility.
>
> Kenn
>
> [1]
> https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/get
> [2]
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L441
> [3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>
> On Sun, Jul 9, 2017 at 2:54 PM, Randal Moore <rd...@gmail.com> wrote:
>
>> Is this part of the Beam API or something I should look at the google
>> docs for help?  Assume a job is running in dataflow - how can an interested
>> third-party app query the status if it knows the job-id?
>>
>> rdm
>>
>
>

Re: API to query the state of a running dataflow job?

Posted by Kenneth Knowles <kl...@google.com>.
(Speaking for Java, but I think Python is similar)

There's nothing in the Beam API right now for querying a job unless you
have a handle on the original object returned by the runner. The nature of
the result of run() is particular to a runner, though it is easy to imagine
a feature whereby you can "attach" to a known running job.

So I think your best option is to use runner-specific APIs for now. For
Dataflow that would be the cloud APIs [1]. You can see how it is done by
the Beam wrapper DataflowPipelineJob [2] as a reference.

Out of curiosity - what sort of third-party app? It would super if you
could file a JIRA [3] describing your use case with some more details, to
help gain visibility.

Kenn

[1]
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/get
[2]
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L441
[3] https://issues.apache.org/jira/secure/CreateIssue!default.jspa

On Sun, Jul 9, 2017 at 2:54 PM, Randal Moore <rd...@gmail.com> wrote:

> Is this part of the Beam API or something I should look at the google docs
> for help?  Assume a job is running in dataflow - how can an interested
> third-party app query the status if it knows the job-id?
>
> rdm
>