You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Yi Hu (Jira)" <ji...@apache.org> on 2022/05/25 18:46:00 UTC

[jira] [Commented] (BEAM-14498) Python sdk's PeriodicImpulse generates a bounded PCollection

    [ https://issues.apache.org/jira/browse/BEAM-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542193#comment-17542193 ] 

Yi Hu commented on BEAM-14498:
------------------------------

Periodic Impulse's DoFn uses a restriction_tracker.defer_remainder which is supposed to make the output pcoll unbounded. Seems like a bug.

Also there is a unbounded_per_element decorator in the sdk, but its value is never used, and adding this decorator does not change the boundedness. 

> Python sdk's PeriodicImpulse generates a bounded PCollection
> ------------------------------------------------------------
>
>                 Key: BEAM-14498
>                 URL: https://issues.apache.org/jira/browse/BEAM-14498
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Yi Hu
>            Priority: P2
>
> See the dev mail list thread for details: https://lists.apache.org/thread/ps3m0jc0ngqp1y2s0mv2n6hxhvgkr3vw
> PeriodicImpluse transform in Java sdk generates unbounded
> PCollection; while in Python sdk it generates bounded PCollection. The
> latter case may cause issues in streaming.
> Per Cham: Note that the primary use-case of PeriodicImpulse (according to the design doc) was to
> generate a fixed/bounded input that can slowly change over time but
> changing over time dimension would make it unbounded. 
> Seems like we need to make python PeriodicImpulse generates an unbounded pcoll, in  alignment with Java implementation, and also make sure that the change does not break the current implementation of its original use case (stream enrichment problem).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)