You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Binh Nguyen Van <bi...@gmail.com> on 2022/11/07 08:38:31 UTC

Single side input to multiple transforms

Hi,

I am writing a pipeline where I have one singleton side input that I want
to use in multiple different transforms. When I run the pipeline in Google
Dataflow I see multiple entries in the logs that have a message like this

Deduplicating side input tags, found non-unique side input key
org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.<init>:1204#4663620f501c9270

Is this something that I should avoid? If so how can I do that?

Thanks
-Binh

Re: Single side input to multiple transforms

Posted by Binh Nguyen Van <bi...@gmail.com>.
Hi,

No, it is a Java job.

This is an example code that causes the duplicate side input tag log entries

PCollectionView<MyData> sideInput = sideCollection.apply(View.asSingleton());

inputCollection
    .apply(ParDo.of(new MyFn1()).withSideInputs(sideInput))
    .apply(ParDo.of(new MyFn2()).withSideInputs(sideInput));

But if I create two separated views like this then the duplicate side input
tag log entries won’t appear

PCollectionView<MyData> sideInput1 = sideCollection.apply(View.asSingleton());
PCollectionView<MyData> sideInput2 = sideCollection.apply(View.asSingleton());

inputCollection
    .apply(ParDo.of(new MyFn1()).withSideInputs(sideInput1))
    .apply(ParDo.of(new MyFn2()).withSideInputs(sideInput2));

-Binh

On Mon, Nov 7, 2022 at 10:50 AM Reuven Lax via user <us...@beam.apache.org>
wrote:

> Is this a Python job?
>
> On Mon, Nov 7, 2022 at 12:38 AM Binh Nguyen Van <bi...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am writing a pipeline where I have one singleton side input that I want
>> to use in multiple different transforms. When I run the pipeline in Google
>> Dataflow I see multiple entries in the logs that have a message like this
>>
>> Deduplicating side input tags, found non-unique side input key
>> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.<init>:1204#4663620f501c9270
>>
>> Is this something that I should avoid? If so how can I do that?
>>
>> Thanks
>> -Binh
>>
>

Re: Single side input to multiple transforms

Posted by Reuven Lax via user <us...@beam.apache.org>.
Is this a Python job?

On Mon, Nov 7, 2022 at 12:38 AM Binh Nguyen Van <bi...@gmail.com> wrote:

> Hi,
>
> I am writing a pipeline where I have one singleton side input that I want
> to use in multiple different transforms. When I run the pipeline in Google
> Dataflow I see multiple entries in the logs that have a message like this
>
> Deduplicating side input tags, found non-unique side input key
> org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.<init>:1204#4663620f501c9270
>
> Is this something that I should avoid? If so how can I do that?
>
> Thanks
> -Binh
>