You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "niklas Hansson (JIRA)" <ji...@apache.org> on 2019/04/16 14:49:00 UTC

[jira] [Commented] (BEAM-7026) Python SDK: Unable to obtain the PCollection for output tags which are not consumed by a downstream step.

    [ https://issues.apache.org/jira/browse/BEAM-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819098#comment-16819098 ] 

niklas Hansson commented on BEAM-7026:
--------------------------------------

I would be happy to start look at this :)

> Python SDK: Unable to obtain the PCollection for output tags which are not consumed by a downstream step.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7026
>                 URL: https://issues.apache.org/jira/browse/BEAM-7026
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-harness
>            Reporter: Alex Amato
>            Assignee: niklas Hansson
>            Priority: Major
>
> I noticed that we are not able to convert the output tag+transform to the pcollection name for metrics (element count/mean byte count), if the Pcollections for the outputed tags are not consumed by a downstream step.
> This isn't critical as (1) Arguably there is no pcollection at all. (2) Output but not consumed PCollections are not critical to count metrics on as those can be optomized away entirely (No need to do any work, collect metrics, etc. for an unconsumed pcollection).
> However, we are able to count this, but we are unable to assign a pcollection name for it, as in this case there is no information about that output tag defined in the bundle descriptor. The alternative fix is to make sure that its always available, even if not consumed.
> Pablo and I looked into this a bit, and he believed it would be possible in pvalue.py'sĀ 
> DoOutputsTuple class. This fix would require callingĀ __getitem__ on all tags to initialize them properly. However, I had some trouble doing this, as this class is a bit strange since it overrides __getattr__. I found weird behaviors when adding functionality to this code. I don't really get how the code functions today, as its own instance variable usage should trigger the custom __getattr__ code, yet we seem to be using these attrs normally with self.X usages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)