You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Kenneth Knowles (JIRA)" <ji...@apache.org> on 2017/04/03 22:45:42 UTC
[jira] [Updated] (BEAM-1867) Element counts missing on Cloud
Dataflow when PCollection has anything other than hardcoded name pattern
[ https://issues.apache.org/jira/browse/BEAM-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles updated BEAM-1867:
----------------------------------
Summary: Element counts missing on Cloud Dataflow when PCollection has anything other than hardcoded name pattern (was: Element counts missing on Cloud Dataflow when PCollection is renamed (by user or pipeline surgery))
> Element counts missing on Cloud Dataflow when PCollection has anything other than hardcoded name pattern
> --------------------------------------------------------------------------------------------------------
>
> Key: BEAM-1867
> URL: https://issues.apache.org/jira/browse/BEAM-1867
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Kenneth Knowles
> Priority: Blocker
> Fix For: First stable release
>
>
> In 0.6.0 and 0.7.0-SNAPSHOT (and possibly all past versions, these are just those where it is confirmed) element count and byte metrics are not reported correctly when the output PCollection for a primitive transform is not {{transformname + ".out" + index}}.
> In 0.7.0-SNAPSHOT, the DataflowRunner uses pipeline surgery to replace the composite {{ParDoSingle}} (that contains a {{ParDoMulti}}) with a Dataflow-specific non-composite {{ParDoSingle}}. So metrics are reported for names like {{"ParDoSingle(MyDoFn).out"}} when they should be reported for {{"ParDoSingle/ParDoMulti(MyDoFn).out"}}. So all single-output ParDo transforms lack these metrics on their outputs.
> In 0.6.0 the same problem occurs if the user ever uses {{PCollection.setName}} to give their collection a meaningful name.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)