You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Thomas Weise <th...@apache.org> on 2018/08/08 16:08:19 UTC

Portable streaming side inputs

I'm working on support for side inputs in streaming mode in the portable
Flink runner [1].

The runner would be responsible for holding main inputs (durably) when side
inputs are not available.

To check if side inputs are available, the window mapping function is
required (see SimplePushbackSideInputDoFnRunner.isReady).

The window mapping function (like viewFn) is SDK specific serialized in the
proto (see PCollectionViewTranslation).

With Python SDK, the Flink runner cannot rehydrate these objects.

Is my understanding correct and what assumptions can be made about the
window mapping in the runner?

Thanks,
Thomas

[1] https://issues.apache.org/jira/browse/BEAM-2930

Re: Portable streaming side inputs

Posted by Lukasz Cwik <lc...@google.com>.
I have been using this inside Dataflow with the PushbackSideInputDoFnRunner
as the window mapping fn:
https://github.com/lukecwik/incubator-beam/commit/4324185192b27026270ee342b01720ff40e71df8

The input is a KV with the key being a nonce and the value being a window,
the output must be a KV with the key being the same nonce as the input and
the value being the mapped window.
All window mapping fns are deterministic and all window encodings are
deterministic which means that you can cache the encoded output window
based upon the encoded input window after it has been mapped once
indefinitely.


On Wed, Aug 8, 2018 at 9:08 AM Thomas Weise <th...@apache.org> wrote:

> I'm working on support for side inputs in streaming mode in the portable
> Flink runner [1].
>
> The runner would be responsible for holding main inputs (durably) when
> side inputs are not available.
>
> To check if side inputs are available, the window mapping function is
> required (see SimplePushbackSideInputDoFnRunner.isReady).
>
> The window mapping function (like viewFn) is SDK specific serialized in
> the proto (see PCollectionViewTranslation).
>
> With Python SDK, the Flink runner cannot rehydrate these objects.
>
> Is my understanding correct and what assumptions can be made about the
> window mapping in the runner?
>
> Thanks,
> Thomas
>
> [1] https://issues.apache.org/jira/browse/BEAM-2930
>