You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Alex Amato <aj...@google.com> on 2018/11/21 01:28:23 UTC

MetricResult querying design questiosn

I was wondering if we have some design about MetricResult querying, which
is a queryable object that exists on the PipelineResult.

IMO, the way this should ideally work, is:

(1) The runner would be responsible for querying the metrics, since a
Runner will have its own metrics aggregation system, which can be queried.

(2) Then APIs to invoke this need to be implemented in every language. We
need API bindings for every language, but would want them to delegate to
the Runner

(3) Further, we would need a way to unify how all of the runners
communicate metrics, i.e. we should have some semantics, that if a
Metrics/MonitoringInfos with a certain spec should be returned in the
MetricResult in a specific way (That is to avoid weird issues, like
Dataflow Runner modifying some PTransform names for example).

Ideally, The MetricResult should give you the same metric no matter which
runner you are using.

I don't think this is the case today. I think it might be more true that
each SDK needs to figure out which runner its using and invoke some code to
query metrics for that runner.

Is there a document somewhere? Or if anyone has implemented this, for one
of the runners+sdks, would it be possible to give a brief overview of how
this works?

Thanks,
Alex

Re: MetricResult querying design questiosn

Posted by Kenneth Knowles <ke...@apache.org>.
(1)-(3) make sense to me; perhaps (2) can be autogenerated by gRPC and
wrapped into nicer APIs as desired. I think if you transliterate from Java
to proto3 then the sketches in the "Querying Metrics" section of
http://s.apache.org/beam-metrics-api have some of the same ideas - what is
left blank is what MetricsFilters would support.

Kenn

On Tue, Nov 20, 2018 at 5:39 PM Alex Amato <aj...@google.com> wrote:

> I was wondering if we have some design about MetricResult querying, which
> is a queryable object that exists on the PipelineResult.
>
> IMO, the way this should ideally work, is:
>
> (1) The runner would be responsible for querying the metrics, since a
> Runner will have its own metrics aggregation system, which can be queried.
>
> (2) Then APIs to invoke this need to be implemented in every language. We
> need API bindings for every language, but would want them to delegate to
> the Runner
>
> (3) Further, we would need a way to unify how all of the runners
> communicate metrics, i.e. we should have some semantics, that if a
> Metrics/MonitoringInfos with a certain spec should be returned in the
> MetricResult in a specific way (That is to avoid weird issues, like
> Dataflow Runner modifying some PTransform names for example).
>
> Ideally, The MetricResult should give you the same metric no matter which
> runner you are using.
>
> I don't think this is the case today. I think it might be more true that
> each SDK needs to figure out which runner its using and invoke some code to
> query metrics for that runner.
>
> Is there a document somewhere? Or if anyone has implemented this, for one
> of the runners+sdks, would it be possible to give a brief overview of how
> this works?
>
> Thanks,
> Alex
>