You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/05/08 21:02:00 UTC

[jira] [Created] (BEAM-9934) Resolve differences in beam:metric:element_count:v1 implementations

Luke Cwik created BEAM-9934:
-------------------------------

             Summary: Resolve differences in beam:metric:element_count:v1 implementations
                 Key: BEAM-9934
                 URL: https://issues.apache.org/jira/browse/BEAM-9934
             Project: Beam
          Issue Type: Bug
          Components: sdk-go, sdk-java-harness, sdk-py-harness
            Reporter: Luke Cwik
            Assignee: Luke Cwik


The [element count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206] metric represents the number of elements within a PCollection and is interpreted differently across the Beam SDK versions.

In the [Java SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207] this represents the number of elements and includes how many windows those elements are in. This metric is incremented as soon as the element has been output.

In the [Python SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247] this represents the number of elements and doesn't include how many windows those elements are in. The metric is also only incremented after the element has finished processing.

The [Go SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260] does the same thing as Python.

Traditionally in Dataflow this has always been the exploded window element count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)