You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/05/12 17:20:03 UTC

[jira] [Commented] (BEAM-9322) Python SDK ignores manually set PCollection tags

    [ https://issues.apache.org/jira/browse/BEAM-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343394#comment-17343394 ] 

Beam JIRA Bot commented on BEAM-9322:
-------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean.


> Python SDK ignores manually set PCollection tags
> ------------------------------------------------
>
>                 Key: BEAM-9322
>                 URL: https://issues.apache.org/jira/browse/BEAM-9322
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Sam Rohde
>            Priority: P2
>              Labels: stale-P2
>          Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> The Python SDK currently ignores any tags set on PCollections manually when applying PTransforms when adding the PCollection to the PTransform [outputs|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L595]]. In the [add_output|[https://github.com/apache/beam/blob/688a4ea53f315ec2aa2d37602fd78496fca8bb4f/sdks/python/apache_beam/pipeline.py#L872]] method, the tag is set to None for all PValues, meaning the output tags are set to an enumeration index over the PCollection outputs. The tags are not propagated to correctly which can be a problem on relying on the output PCollection tags to match the user set values.
> The fix is to correct BEAM-1833, and always pass in the tags. However, that doesn't fix the problem for nested PCollections. If you have a dict of lists of PCollections, what should their tags be correctly set to? In order to fix this, first propagate the correct tag then talk with the community about the best auto-generated tags.
> Some users may rely on the old implementation, so a flag will be created: "force_generated_pcollection_output_ids" and be default set to False. If True, this will go to the old implementation and generate tags for PCollections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)