You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Pablo Estrada <pa...@google.com> on 2019/10/05 01:14:31 UTC

[portability] Removing the old portable metrics API...

Hello devs,
I recently took a look at how Dataflow is retrieving metrics from the Beam
SDK harnesses, and noticed something. As you may (or may not) remember, the
portability API currently has two ways of reporting metrics. Namely, the
newer MonitoringInfo API[1], and the older Metrics one[2].

This is somewhat troublesome because now we have two things that do the
same thing. The SDKs report double the amount of metrics[3][4], and I bet
it's confusing for runner implementers.

Luckily, it seems like the Flink and Spark runners do use the new API
[5][6] - yay! : ) - so I guess then the only runner that uses the old API
is Dataflow? (internally)

Which way does the Samza runner use? +Hai Lu?
How about the Go SDK +Robert Burke <re...@google.com> ? - Ah I bet this uses
the old API?

If they all use the MonitoringInfos, we may be able to clean up the old
api, and move to the new one (somewhat)soon : )

[1]
https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
[2]
https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
[3]
https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
[4]
https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384

[5]
https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
[6]
https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226

Re: [portability] Removing the old portable metrics API...

Posted by Robert Bradshaw <ro...@google.com>.
To clarify on the Dataflow story, there are three Dataflow (worker)
implementations: the legacy non-portable one, the Portability Java Runner
Harness (currently used by Python Streaming), and the Portability Unified
Worker (intended to replace the Java Runner Harness). MonitoringInfos
doesn't apply to the legacy one, is implemented for the JRH, but the UW
still uses the old-style counters.

So, I think we can't delete the old-style yet, but hopefully soon (pending
Go + UW support--it's on the roadmaps but I don't have an ETA).

On Wed, Oct 9, 2019 at 11:40 AM Alex Amato <aj...@google.com> wrote:

> @Robert Bradshaw <ro...@google.com> Dataflow is updated to use
> MonitoringInfos.
>
> This is specifically referring to the FN API Layer. Beam Python and Beam
> Java export metrics using the new APIs. And the DataflowRunner harness is
> consuming and using those. When I was removed from that project, most of
> the metrics were implemented in the
> Python and Java SDKs as MonitoringInfos.
>
>
>
> Java SDK
>
> Python SDK
>
> Go SDK
>
> User Counters
>
> Done
>
> Done
>
> Legacy FN API
>
> User Distributions
>
> Done
>
> Done
>
> Legacy FN API
>
> Execution Time Start
>
> Done
>
> Done
>
> Not Started
>
> Execution Time Process()
>
> Done
>
> Done
>
> Not Started
>
> Execution Time Finish()
>
> Done
>
> Done
>
> Not Started
>
> Element Count
>
> Done
>
> Done
>
> Legacy FN API
>
> Sampled PColl Byte Size
>
> Pending (PR/8416 <https://github.com/apache/beam/pull/8416>)
>
> Handoff instructions
>
> BEAM-7462 <https://issues.apache.org/jira/browse/BEAM-7462?filter=-2>
>
> Done
>
> Legacy FN API
>
> And the Dataflow Java Runner Harness was consuming this. +Mikhail
> Gryzykhin <mi...@google.com> implemented the runner harness layer.
>
> Do delete the deprecated stuff, we would need to get the Go SDK on
> MonitoringInfos for what it has implemented so far.
>
> Integration test coverage could be increased. But we wrote this test
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline.py>
> .
>
>
> On Wed, Oct 9, 2019 at 10:51 AM Luke Cwik <lc...@google.com> wrote:
>
>> One way would be to report both so this way we don't need to update the
>> Dataflow Java implementation but other runners using the new API get all
>> the metrics.
>>
>> On Mon, Oct 7, 2019 at 10:00 AM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Yes, Dataflow still uses the old API, for both counters and for its
>>> progress/autoscaling mechanisms. We'd need to convert that over as
>>> well (which is on the TODO list but lower than finishing up support
>>> for portability in general).
>>>
>>> On Mon, Oct 7, 2019 at 9:56 AM Robert Burke <re...@google.com> wrote:
>>> >
>>> > The Go SDK uses the old API [1], but it shouldn't be too hard to
>>> migrate it.
>>> >
>>> > The main thing I'd want to do at the same time is move the
>>> dependencies on the protos out of that package and have those live only in
>>> the harness package [2]. I wasn't aware of that particular separation of
>>> concerns until much later, but allows for alternative harness
>>> implementations.
>>> >
>>> > I have some other work to get the Per-DoFn profiling metrics (eleemnt
>>> count, size, time) into the Go SDK this quarter, so I can handle this then.
>>> >
>>> > [1]
>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/metrics/metrics.go#L474
>>> > [2]
>>> https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/core/runtime/harness
>>> >
>>> > On Fri, Oct 4, 2019 at 6:14 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>> >>
>>> >> Hello devs,
>>> >> I recently took a look at how Dataflow is retrieving metrics from the
>>> Beam SDK harnesses, and noticed something. As you may (or may not)
>>> remember, the portability API currently has two ways of reporting metrics.
>>> Namely, the newer MonitoringInfo API[1], and the older Metrics one[2].
>>> >>
>>> >> This is somewhat troublesome because now we have two things that do
>>> the same thing. The SDKs report double the amount of metrics[3][4], and I
>>> bet it's confusing for runner implementers.
>>> >>
>>> >> Luckily, it seems like the Flink and Spark runners do use the new API
>>> [5][6] - yay! : ) - so I guess then the only runner that uses the old API
>>> is Dataflow? (internally)
>>> >>
>>> >> Which way does the Samza runner use? +Hai Lu?
>>> >> How about the Go SDK +Robert Burke ? - Ah I bet this uses the old API?
>>> >>
>>> >> If they all use the MonitoringInfos, we may be able to clean up the
>>> old api, and move to the new one (somewhat)soon : )
>>> >>
>>> >> [1]
>>> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
>>> >> [2]
>>> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
>>> >> [3]
>>> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
>>> >> [4]
>>> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384
>>> >>
>>> >> [5]
>>> https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
>>> >> [6]
>>> https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226
>>>
>>

Re: [portability] Removing the old portable metrics API...

Posted by Alex Amato <aj...@google.com>.
@Robert Bradshaw <ro...@google.com> Dataflow is updated to use
MonitoringInfos.

This is specifically referring to the FN API Layer. Beam Python and Beam
Java export metrics using the new APIs. And the DataflowRunner harness is
consuming and using those. When I was removed from that project, most of
the metrics were implemented in the
Python and Java SDKs as MonitoringInfos.



Java SDK

Python SDK

Go SDK

User Counters

Done

Done

Legacy FN API

User Distributions

Done

Done

Legacy FN API

Execution Time Start

Done

Done

Not Started

Execution Time Process()

Done

Done

Not Started

Execution Time Finish()

Done

Done

Not Started

Element Count

Done

Done

Legacy FN API

Sampled PColl Byte Size

Pending (PR/8416 <https://github.com/apache/beam/pull/8416>)

Handoff instructions

BEAM-7462 <https://issues.apache.org/jira/browse/BEAM-7462?filter=-2>

Done

Legacy FN API

And the Dataflow Java Runner Harness was consuming this. +Mikhail Gryzykhin
<mi...@google.com> implemented the runner harness layer.

Do delete the deprecated stuff, we would need to get the Go SDK on
MonitoringInfos for what it has implemented so far.

Integration test coverage could be increased. But we wrote this test
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_exercise_metrics_pipeline.py>
.


On Wed, Oct 9, 2019 at 10:51 AM Luke Cwik <lc...@google.com> wrote:

> One way would be to report both so this way we don't need to update the
> Dataflow Java implementation but other runners using the new API get all
> the metrics.
>
> On Mon, Oct 7, 2019 at 10:00 AM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Yes, Dataflow still uses the old API, for both counters and for its
>> progress/autoscaling mechanisms. We'd need to convert that over as
>> well (which is on the TODO list but lower than finishing up support
>> for portability in general).
>>
>> On Mon, Oct 7, 2019 at 9:56 AM Robert Burke <re...@google.com> wrote:
>> >
>> > The Go SDK uses the old API [1], but it shouldn't be too hard to
>> migrate it.
>> >
>> > The main thing I'd want to do at the same time is move the dependencies
>> on the protos out of that package and have those live only in the harness
>> package [2]. I wasn't aware of that particular separation of concerns until
>> much later, but allows for alternative harness implementations.
>> >
>> > I have some other work to get the Per-DoFn profiling metrics (eleemnt
>> count, size, time) into the Go SDK this quarter, so I can handle this then.
>> >
>> > [1]
>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/metrics/metrics.go#L474
>> > [2]
>> https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/core/runtime/harness
>> >
>> > On Fri, Oct 4, 2019 at 6:14 PM Pablo Estrada <pa...@google.com>
>> wrote:
>> >>
>> >> Hello devs,
>> >> I recently took a look at how Dataflow is retrieving metrics from the
>> Beam SDK harnesses, and noticed something. As you may (or may not)
>> remember, the portability API currently has two ways of reporting metrics.
>> Namely, the newer MonitoringInfo API[1], and the older Metrics one[2].
>> >>
>> >> This is somewhat troublesome because now we have two things that do
>> the same thing. The SDKs report double the amount of metrics[3][4], and I
>> bet it's confusing for runner implementers.
>> >>
>> >> Luckily, it seems like the Flink and Spark runners do use the new API
>> [5][6] - yay! : ) - so I guess then the only runner that uses the old API
>> is Dataflow? (internally)
>> >>
>> >> Which way does the Samza runner use? +Hai Lu?
>> >> How about the Go SDK +Robert Burke ? - Ah I bet this uses the old API?
>> >>
>> >> If they all use the MonitoringInfos, we may be able to clean up the
>> old api, and move to the new one (somewhat)soon : )
>> >>
>> >> [1]
>> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
>> >> [2]
>> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
>> >> [3]
>> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
>> >> [4]
>> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384
>> >>
>> >> [5]
>> https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
>> >> [6]
>> https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226
>>
>

Re: [portability] Removing the old portable metrics API...

Posted by Luke Cwik <lc...@google.com>.
One way would be to report both so this way we don't need to update the
Dataflow Java implementation but other runners using the new API get all
the metrics.

On Mon, Oct 7, 2019 at 10:00 AM Robert Bradshaw <ro...@google.com> wrote:

> Yes, Dataflow still uses the old API, for both counters and for its
> progress/autoscaling mechanisms. We'd need to convert that over as
> well (which is on the TODO list but lower than finishing up support
> for portability in general).
>
> On Mon, Oct 7, 2019 at 9:56 AM Robert Burke <re...@google.com> wrote:
> >
> > The Go SDK uses the old API [1], but it shouldn't be too hard to migrate
> it.
> >
> > The main thing I'd want to do at the same time is move the dependencies
> on the protos out of that package and have those live only in the harness
> package [2]. I wasn't aware of that particular separation of concerns until
> much later, but allows for alternative harness implementations.
> >
> > I have some other work to get the Per-DoFn profiling metrics (eleemnt
> count, size, time) into the Go SDK this quarter, so I can handle this then.
> >
> > [1]
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/metrics/metrics.go#L474
> > [2]
> https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/core/runtime/harness
> >
> > On Fri, Oct 4, 2019 at 6:14 PM Pablo Estrada <pa...@google.com> wrote:
> >>
> >> Hello devs,
> >> I recently took a look at how Dataflow is retrieving metrics from the
> Beam SDK harnesses, and noticed something. As you may (or may not)
> remember, the portability API currently has two ways of reporting metrics.
> Namely, the newer MonitoringInfo API[1], and the older Metrics one[2].
> >>
> >> This is somewhat troublesome because now we have two things that do the
> same thing. The SDKs report double the amount of metrics[3][4], and I bet
> it's confusing for runner implementers.
> >>
> >> Luckily, it seems like the Flink and Spark runners do use the new API
> [5][6] - yay! : ) - so I guess then the only runner that uses the old API
> is Dataflow? (internally)
> >>
> >> Which way does the Samza runner use? +Hai Lu?
> >> How about the Go SDK +Robert Burke ? - Ah I bet this uses the old API?
> >>
> >> If they all use the MonitoringInfos, we may be able to clean up the old
> api, and move to the new one (somewhat)soon : )
> >>
> >> [1]
> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
> >> [2]
> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
> >> [3]
> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
> >> [4]
> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384
> >>
> >> [5]
> https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
> >> [6]
> https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226
>

Re: [portability] Removing the old portable metrics API...

Posted by Robert Bradshaw <ro...@google.com>.
Yes, Dataflow still uses the old API, for both counters and for its
progress/autoscaling mechanisms. We'd need to convert that over as
well (which is on the TODO list but lower than finishing up support
for portability in general).

On Mon, Oct 7, 2019 at 9:56 AM Robert Burke <re...@google.com> wrote:
>
> The Go SDK uses the old API [1], but it shouldn't be too hard to migrate it.
>
> The main thing I'd want to do at the same time is move the dependencies on the protos out of that package and have those live only in the harness package [2]. I wasn't aware of that particular separation of concerns until much later, but allows for alternative harness implementations.
>
> I have some other work to get the Per-DoFn profiling metrics (eleemnt count, size, time) into the Go SDK this quarter, so I can handle this then.
>
> [1] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/metrics/metrics.go#L474
> [2] https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/core/runtime/harness
>
> On Fri, Oct 4, 2019 at 6:14 PM Pablo Estrada <pa...@google.com> wrote:
>>
>> Hello devs,
>> I recently took a look at how Dataflow is retrieving metrics from the Beam SDK harnesses, and noticed something. As you may (or may not) remember, the portability API currently has two ways of reporting metrics. Namely, the newer MonitoringInfo API[1], and the older Metrics one[2].
>>
>> This is somewhat troublesome because now we have two things that do the same thing. The SDKs report double the amount of metrics[3][4], and I bet it's confusing for runner implementers.
>>
>> Luckily, it seems like the Flink and Spark runners do use the new API [5][6] - yay! : ) - so I guess then the only runner that uses the old API is Dataflow? (internally)
>>
>> Which way does the Samza runner use? +Hai Lu?
>> How about the Go SDK +Robert Burke ? - Ah I bet this uses the old API?
>>
>> If they all use the MonitoringInfos, we may be able to clean up the old api, and move to the new one (somewhat)soon : )
>>
>> [1] https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
>> [2] https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
>> [3] https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
>> [4] https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384
>>
>> [5] https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
>> [6] https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226

Re: [portability] Removing the old portable metrics API...

Posted by Robert Burke <re...@google.com>.
The Go SDK uses the old API [1], but it shouldn't be too hard to migrate it.

The main thing I'd want to do at the same time is move the dependencies on
the protos out of that package and have those live only in the harness
package [2]. I wasn't aware of that particular separation of concerns until
much later, but allows for alternative harness implementations.

I have some other work to get the Per-DoFn profiling metrics (eleemnt
count, size, time) into the Go SDK this quarter, so I can handle this then.

[1]
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/metrics/metrics.go#L474
[2]
https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/core/runtime/harness

On Fri, Oct 4, 2019 at 6:14 PM Pablo Estrada <pa...@google.com> wrote:

> Hello devs,
> I recently took a look at how Dataflow is retrieving metrics from the Beam
> SDK harnesses, and noticed something. As you may (or may not) remember, the
> portability API currently has two ways of reporting metrics. Namely, the
> newer MonitoringInfo API[1], and the older Metrics one[2].
>
> This is somewhat troublesome because now we have two things that do the
> same thing. The SDKs report double the amount of metrics[3][4], and I bet
> it's confusing for runner implementers.
>
> Luckily, it seems like the Flink and Spark runners do use the new API
> [5][6] - yay! : ) - so I guess then the only runner that uses the old API
> is Dataflow? (internally)
>
> Which way does the Samza runner use? +Hai Lu?
> How about the Go SDK +Robert Burke <re...@google.com> ? - Ah I bet this
> uses the old API?
>
> If they all use the MonitoringInfos, we may be able to clean up the old
> api, and move to the new one (somewhat)soon : )
>
> [1]
> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L395
> [2]
> https://github.com/apache/beam/blob/v2.15.0/model/fn-execution/src/main/proto/beam_fn_api.proto#L391
> [3]
> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L406-L414
> [4]
> https://github.com/apache/beam/blob/c1007b678a00ea85671872236edef940a8e56adc/sdks/python/apache_beam/runners/worker/sdk_worker.py#L378-L384
>
> [5]
> https://github.com/apache/beam/blob/44fa33e6518574cb9561f47774e218e0910093fe/runners/flink/src/main/java/org/apache/beam/runners/flink/metrics/FlinkMetricContainer.java#L94-L97
> [6]
> https://github.com/apache/beam/blob/932bd80a17171bd2d8157820ffe09e8389a52b9b/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L219-L226
>