You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Aviem Zur <av...@gmail.com> on 2017/01/29 13:33:48 UTC

Pipeline graph reflection

Hi all,

While working on implementing metrics support in the Spark Runner a need
arose for composing a unique identifier of a transform, to differentiate it
from other transforms with the same name.

With the help of @bjchambers I understood that something similar to this
exists in the Dataflow Runner which creates a string that is something
along the lines of
"PBegin/SomeInputTransform/SomeParDo/...MyTransform.#Running_number_for_collisions".

I'm trying to figure out:
A) How this is done in Dataflow runner.
B) Can be pulled up as a util for other runners, as conversation regarding
metrics API and querying is hinting this will be needed.
C) From my own forays into the code I came across
`org.apache.beam.sdk.values.PValue#getProducingTransformInternal` which can
be recursed on but is marked as deprecated. Are there efforts being made
elsewhere for this sort of pipeline graph reflection?

Re: Pipeline graph reflection

Posted by Dan Halperin <dh...@google.com.INVALID>.
Thomas is working on this pretty explicitly. Beam needs this for the
Runner/Fn APIs -- except, probably, the unique IDs will be numbers or
hashes so that they are more useable than long strings.

The code to check whether names are unique, etc., is actually in the SDK
core right now. See, e.g.,
https://github.com/apache/beam/blob/7984fe3fc20160d2286433434190f35658aef158/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java#L359

Dan

On Sun, Jan 29, 2017 at 5:33 AM, Aviem Zur <av...@gmail.com> wrote:

> Hi all,
>
> While working on implementing metrics support in the Spark Runner a need
> arose for composing a unique identifier of a transform, to differentiate it
> from other transforms with the same name.
>
> With the help of @bjchambers I understood that something similar to this
> exists in the Dataflow Runner which creates a string that is something
> along the lines of
> "PBegin/SomeInputTransform/SomeParDo/...MyTransform.#
> Running_number_for_collisions".
>
> I'm trying to figure out:
> A) How this is done in Dataflow runner.
> B) Can be pulled up as a util for other runners, as conversation regarding
> metrics API and querying is hinting this will be needed.
> C) From my own forays into the code I came across
> `org.apache.beam.sdk.values.PValue#getProducingTransformInternal` which
> can
> be recursed on but is marked as deprecated. Are there efforts being made
> elsewhere for this sort of pipeline graph reflection?
>