You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 17:29:20 UTC

[GitHub] [beam] kennknowles opened a new issue, #18358: Improve interplay between PushbackSideInputRunner and GroupAlsoByWindowViaWindowSetDoFn

kennknowles opened a new issue, #18358:
URL: https://github.com/apache/beam/issues/18358

   This originated from a discussion on a PR: https://github.com/apache/beam/pull/2235
   
   `GroupAlsoByWindowViaWindowSetDoFn`/`GroupAlsoByWindowViaWindowSetNewDoFn` and `PushbackSideInputDoFnRunner` don't work well together and we manually need to explode windows in `FlinkStreamingTransformTranslators.ToKeyedWorkItem` because of this:
   
    - `GroupAlsoByWindowViaWindowSetDoFn` is a `DoFn<KeyedWorkItem<K, InputT>, KV<K, OutputT>>` so you have to push in `KeyedWorkItem`. These themselves contain `WindowedValue<InputT>` (or timers).
    - For executing a `DoFn` we use a `DoFnRunner`. For our problem the interesting case is using a `PushbackSideInputDoFnRunner`. The interesting method is `processElementInReadyWindows(WindowedValue<InputT> elem)` where `InputT` is the input type of the `DoFn` which, for the windowing case, is `KeyedWorkItem<K, InputT>` (from above). The actual expanded type signature is thus `processElementInReadyWindows(WindowedValue<KeyedWorkItem<K, InputT>> elem)` where the keyed work items again contain `WindowedValues` (again, from above).
   I think the `PushbackSideInputDoFnRunner` was not initially meant for executing `GroupAlsoByWindowViaWindowSetDoFns`.
   
   
   
   
   
   Imported from Jira [BEAM-1850](https://issues.apache.org/jira/browse/BEAM-1850). Original Jira may contain additional context.
   Reported by: aljoscha.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org