You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Yichi Zhang (Jira)" <ji...@apache.org> on 2019/09/17 18:36:00 UTC

[jira] [Resolved] (BEAM-6723) Reshuffle in streaming pipeline does not work on dataflowrunner python

     [ https://issues.apache.org/jira/browse/BEAM-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yichi Zhang resolved BEAM-6723.
-------------------------------
    Resolution: Fixed

> Reshuffle in streaming pipeline does not work on dataflowrunner python
> ----------------------------------------------------------------------
>
>                 Key: BEAM-6723
>                 URL: https://issues.apache.org/jira/browse/BEAM-6723
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-py-core
>            Reporter: Brecht Coghe
>            Priority: Major
>             Fix For: 2.16.0
>
>
> When using a Reshuffle after windowing, the dataflowrunner gives the following error:
> "org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to org.apache.beam.sdk.transforms.windowing.GlobalWindow"
> This makes it impossible to prevent fusion of operations and to distribute our workload nicely.
>  
> For more context, see following post where [~robertwb] linked this Jira board:
> [https://stackoverflow.com/questions/54764081/google-dataflow-streaming-pipeline-is-not-distributing-workload-over-several-wor]
> Quick summary: Our use case is actually video analytics. We want to use the windowing to get small intervals of videos and the grouping to group per video stream. The group by key is thus a game id. What comes out the window and grouping is several windows within one game so with 1 game_id key and this does not distribute over several workers. The proposed workaround would be to add the window_id to the key after windowing, then reshuffle and then the processing would be able to run in parallel per window in one videostream. Can someone confirm this approach (if reshuffle would work).
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)