You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Maximilian Michels <mx...@apache.org> on 2019/11/13 13:13:15 UTC

Type of builtin PTransform/PCollection metrics

Hi,

We have a series of builtin PTransform/PCollection metrics: 
https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/model/pipeline/src/main/proto/metrics.proto#L74

Why are those of counters ("beam:metrics:sum_int_64")? I think the 
better default type for most users would be gauge 
("beam:metrics:latest_int_64").

I understand that counters are useful because they retain the sum of all 
reported values, but for getting an idea about the deviation of a 
metric, gauges could be more useful.

Perhaps we could make this configurable?

Thanks,
Max

Re: Type of builtin PTransform/PCollection metrics

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Nov 13, 2019 at 10:56 AM Maximilian Michels <mx...@apache.org> wrote:
>
> > Are you referring specifically to?
> > * beam:metric:element_count:v1
> > * beam:metric:pardo_execution_time:start_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:process_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:finish_bundle_msecs:v1
> > * beam:metric:ptransform_execution_time:total_msecs:v1
>
> Yes.
>
> > Would the gauge be grouped per element or per bundle?
>
> Per bundle. These are reported when the bundle finishes.
>
> > If grouped at the bundle level the metrics are arbitrary to the user since the bundle size is chosen by the runner.
>
> Not necessarily because the bundle size is typically fixed (at least in
> the Flink Runner). In any case, it provides information about how much
> activity occurred in a bundle which is useful to know.
>
> > There is also a very significant overhead for tracking low level metrics
>
> I can't imagine tracking a per-bundle element count or execution time is
> that expensive. Maybe I'm wrong.

These are element counts and execution time per operation (e.g. per
DoFn). FWIW, process_bundle_msecs is mis-named, it should be
"process_element" or just "process" as it refers to the time spend in
that method. beam:metric:ptransform_execution_time:total_msecs:v1
seems redundant with the sum of the others. (Unless it includes
setup/teardown, which it seems are missing as separate values?)

I think what you want is new metrics associated with the bundle +
executable stage as a whole. Distribution metrics would make the most
sense here. (Gauge metrics would just report the value of whatever
bundle finished last...) I don't know how they'd be named, perhaps
they'd be labeled with the full set of transforms that the stage
contains (which is of course not stable)?

> On 13.11.19 18:58, Luke Cwik wrote:
> > Are you referring specifically to?
> > * beam:metric:element_count:v1
> > * beam:metric:pardo_execution_time:start_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:process_bundle_msecs:v1
> > * beam:metric:pardo_execution_time:finish_bundle_msecs:v1
> > * beam:metric:ptransform_execution_time:total_msecs:v1
> >
> > Would the gauge be grouped per element or per bundle?
> > If grouped at the bundle level the metrics are arbitrary to the user
> > since the bundle size is chosen by the runner.
> > If grouped at the element level then only a few of the metrics make sense:
> > * element_count becomes number of outputs per input element
> > * process_bundle_msecs becomes amount of time to process a single input
> > element (does this still apply to elements that can be split?)
> >
> > There is also a very significant overhead for tracking low level metrics
> > in great detail which is why timing is done through a sampling
> > technique. I'm sure if we could do it cheaply then it would make sense
> > to get those metrics. This is also a place where we want each SDK to
> > implement these metrics so complexity may slow down SDK authors from
> > developing them.
> >
> >
> > On Wed, Nov 13, 2019 at 5:13 AM Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Hi,
> >
> >     We have a series of builtin PTransform/PCollection metrics:
> >     https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/model/pipeline/src/main/proto/metrics.proto#L74
> >
> >     Why are those of counters ("beam:metrics:sum_int_64")? I think the
> >     better default type for most users would be gauge
> >     ("beam:metrics:latest_int_64").
> >
> >     I understand that counters are useful because they retain the sum of
> >     all
> >     reported values, but for getting an idea about the deviation of a
> >     metric, gauges could be more useful.
> >
> >     Perhaps we could make this configurable?
> >
> >     Thanks,
> >     Max
> >

Re: Type of builtin PTransform/PCollection metrics

Posted by Maximilian Michels <mx...@apache.org>.
> Are you referring specifically to?
> * beam:metric:element_count:v1
> * beam:metric:pardo_execution_time:start_bundle_msecs:v1
> * beam:metric:pardo_execution_time:process_bundle_msecs:v1
> * beam:metric:pardo_execution_time:finish_bundle_msecs:v1
> * beam:metric:ptransform_execution_time:total_msecs:v1

Yes.

> Would the gauge be grouped per element or per bundle?

Per bundle. These are reported when the bundle finishes.

> If grouped at the bundle level the metrics are arbitrary to the user since the bundle size is chosen by the runner.

Not necessarily because the bundle size is typically fixed (at least in 
the Flink Runner). In any case, it provides information about how much 
activity occurred in a bundle which is useful to know.

> There is also a very significant overhead for tracking low level metrics 

I can't imagine tracking a per-bundle element count or execution time is 
that expensive. Maybe I'm wrong.

-Max

On 13.11.19 18:58, Luke Cwik wrote:
> Are you referring specifically to?
> * beam:metric:element_count:v1
> * beam:metric:pardo_execution_time:start_bundle_msecs:v1
> * beam:metric:pardo_execution_time:process_bundle_msecs:v1
> * beam:metric:pardo_execution_time:finish_bundle_msecs:v1
> * beam:metric:ptransform_execution_time:total_msecs:v1
> 
> Would the gauge be grouped per element or per bundle?
> If grouped at the bundle level the metrics are arbitrary to the user 
> since the bundle size is chosen by the runner.
> If grouped at the element level then only a few of the metrics make sense:
> * element_count becomes number of outputs per input element
> * process_bundle_msecs becomes amount of time to process a single input 
> element (does this still apply to elements that can be split?)
> 
> There is also a very significant overhead for tracking low level metrics 
> in great detail which is why timing is done through a sampling 
> technique. I'm sure if we could do it cheaply then it would make sense 
> to get those metrics. This is also a place where we want each SDK to 
> implement these metrics so complexity may slow down SDK authors from 
> developing them.
> 
> 
> On Wed, Nov 13, 2019 at 5:13 AM Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Hi,
> 
>     We have a series of builtin PTransform/PCollection metrics:
>     https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/model/pipeline/src/main/proto/metrics.proto#L74
> 
>     Why are those of counters ("beam:metrics:sum_int_64")? I think the
>     better default type for most users would be gauge
>     ("beam:metrics:latest_int_64").
> 
>     I understand that counters are useful because they retain the sum of
>     all
>     reported values, but for getting an idea about the deviation of a
>     metric, gauges could be more useful.
> 
>     Perhaps we could make this configurable?
> 
>     Thanks,
>     Max
> 

Re: Type of builtin PTransform/PCollection metrics

Posted by Luke Cwik <lc...@google.com>.
Are you referring specifically to?
* beam:metric:element_count:v1
* beam:metric:pardo_execution_time:start_bundle_msecs:v1
* beam:metric:pardo_execution_time:process_bundle_msecs:v1
* beam:metric:pardo_execution_time:finish_bundle_msecs:v1
* beam:metric:ptransform_execution_time:total_msecs:v1

Would the gauge be grouped per element or per bundle?
If grouped at the bundle level the metrics are arbitrary to the user since
the bundle size is chosen by the runner.
If grouped at the element level then only a few of the metrics make sense:
* element_count becomes number of outputs per input element
* process_bundle_msecs becomes amount of time to process a single input
element (does this still apply to elements that can be split?)

There is also a very significant overhead for tracking low level metrics in
great detail which is why timing is done through a sampling technique. I'm
sure if we could do it cheaply then it would make sense to get those
metrics. This is also a place where we want each SDK to implement these
metrics so complexity may slow down SDK authors from developing them.


On Wed, Nov 13, 2019 at 5:13 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi,
>
> We have a series of builtin PTransform/PCollection metrics:
>
> https://github.com/apache/beam/blob/808cb35018cd228a59b152234b655948da2455fa/model/pipeline/src/main/proto/metrics.proto#L74
>
> Why are those of counters ("beam:metrics:sum_int_64")? I think the
> better default type for most users would be gauge
> ("beam:metrics:latest_int_64").
>
> I understand that counters are useful because they retain the sum of all
> reported values, but for getting an idea about the deviation of a
> metric, gauges could be more useful.
>
> Perhaps we could make this configurable?
>
> Thanks,
> Max
>