You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2021/03/31 19:27:00 UTC

[jira] [Updated] (BEAM-11906) No trigger early repeatedly for session windows

     [ https://issues.apache.org/jira/browse/BEAM-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kenneth Knowles updated BEAM-11906:
-----------------------------------
    Priority: P1  (was: P2)

> No trigger early repeatedly for session windows
> -----------------------------------------------
>
>                 Key: BEAM-11906
>                 URL: https://issues.apache.org/jira/browse/BEAM-11906
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>    Affects Versions: 2.23.0, 2.28.0
>            Reporter: Ning Kang
>            Priority: P1
>
> Originated from: https://stackoverflow.com/questions/66381608/apache-beam-does-not-trigger-early-repeatedly-for-session-windows-on-google-data
> The following pipeline fires early after each element when running locally using DirectRunner, but there are no early triggers when running on google cloud dataflow. On dataflow it triggers only after the session window has closed.
> {code:python}
> ( p
>         | 'read'   >> beam.io.ReadFromPubSub(subscription = 'projects/xxx/subscriptions/xxx-sub')
>         | 'json'   >> beam.Map(lambda x: json.loads(x.decode('utf-8')))
>         | 'kv'     >> beam.Map(lambda x: (x['id'], x['amount']))
>         | 'window' >> beam.WindowInto(window.Sessions(15*60), trigger=trigger.Repeatedly(trigger.AfterCount(1)), accumulation_mode=AccumulationMode.ACCUMULATING)
>         | 'group'  >> beam.GroupByKey()
>         | 'log'    >> beam.Map(lambda x: logging.info(x))
> )
> {code}
> Apache Beam versions tried: 2.23 and 2.28.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)