You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/05/08 21:36:00 UTC

[jira] [Commented] (BEAM-9934) Resolve differences in beam:metric:element_count:v1 implementations

    [ https://issues.apache.org/jira/browse/BEAM-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102956#comment-17102956 ] 

Luke Cwik commented on BEAM-9934:
---------------------------------

Marked as blocker for now if decide that this should get fixed in Python.

> Resolve differences in beam:metric:element_count:v1 implementations
> -------------------------------------------------------------------
>
>                 Key: BEAM-9934
>                 URL: https://issues.apache.org/jira/browse/BEAM-9934
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go, sdk-java-harness, sdk-py-harness
>            Reporter: Luke Cwik
>            Assignee: Luke Cwik
>            Priority: Major
>             Fix For: 2.21.0
>
>
> The [element count|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/model/pipeline/src/main/proto/metrics.proto#L206] metric represents the number of elements within a PCollection and is interpreted differently across the Beam SDK versions.
> In the [Java SDK|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java#L207] this represents the number of elements and includes how many windows those elements are in. This metric is incremented as soon as the element has been output.
> In the [Python SDK|https://github.com/apache/beam/blame/bfd151aa4c3aad29f3aea6482212ff8543ded8d7/sdks/python/apache_beam/runners/worker/opcounters.py#L247] this represents the number of elements and doesn't include how many windows those elements are in. The metric is also only incremented after the element has finished processing.
> The [Go SDK|https://github.com/apache/beam/blob/7097850daa46674b88425a124bc442fc8ce0dcb8/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L260] does the same thing as Python.
> Traditionally in Dataflow this has always been the exploded window element count and the counter is incremented as soon as the element is output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)