You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Yu Zhang (Jira)" <ji...@apache.org> on 2021/07/01 00:01:25 UTC

[jira] [Commented] (BEAM-10928) FlinkDistributionGauge and FlinkGauge metrics are exported as zero to Prometheus when using any Flink's PrometheusReporter

    [ https://issues.apache.org/jira/browse/BEAM-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372274#comment-17372274 ] 

Yu Zhang commented on BEAM-10928:
---------------------------------

I think this is caused by metrics incompatibility. Both Flink UI and Flink PrometheusReporter only support booleans or numbers for Flink Gauge, [https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/metric_reporters/#prometheus].

Could [Beam's DistributionResult.getMean()|https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/DistributionResult.java#L37] be used to update Flink's Gauge? [~aromanenko]

> FlinkDistributionGauge and FlinkGauge metrics are exported as zero to Prometheus when using any Flink's PrometheusReporter
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-10928
>                 URL: https://issues.apache.org/jira/browse/BEAM-10928
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>    Affects Versions: 2.23.0
>            Reporter: Ivan San Jose
>            Priority: P2
>
> To be honest I'm really lost on this one, let me explain the issue:
> Beam has its own metrics types (org/apache/beam/sdk/metrics/Metrics.java) \-counter, distribution, and gauge\-, and, depending on the runner, wraps them into their corresponding runner types. For example, for Flink, Beam is wrapping its Gauge type into a class called FlinkGauge which extends a Gauge<Long>.
> Also, Beam's Distribution metric its wrapped into a Flink's Gauge<DistributionResult>, where DistributionResult is a Beam type containing min,max,sum,count.
> Then, if you are using Flink, and you want to export those metrics to Prometheus, using flink-metrics-prometheus, you will see that they are always zero, and, if you set DEBUG log level for  "org.apache.flink.metrics.prometheus" package, you will see errors like following ones:
> {code}
> 2020-09-18 06:27:04,387 DEBUG Invalid type for Gauge org.apache.beam.runners.flink.metrics.FlinkMetricContainer$FlinkDistributionGauge@30211d3f: org.apache.beam.sdk.metrics.AutoValue_DistributionResult, only number types and booleans are supported by this reporter.
> 2020-09-18 06:27:04,394 DEBUG Invalid type for Gauge org.apache.beam.runners.flink.metrics.FlinkMetricContainer$FlinkGauge@2ad1562: org.apache.beam.sdk.metrics.AutoValue_GaugeResult, only number types and booleans are supported by this reporter.
> {code}
> Which is really weird, because if you check the source code of AbstractPrometheusReporter, you can see that is taking the metric value from Flink's Gauge using getValue():
> https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/AbstractPrometheusReporter.java#L225
> And FlinkGauge.getValue() should return a long instead of org.apache.beam.sdk.metrics.AutoValue_GaugeResult. So I don't understand what is happening there to be honest. May be AutoValue mechanism is messing things up?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)