You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by "Plajt, Vaclav" <Va...@firma.seznam.cz> on 2018/08/20 08:13:52 UTC

dulicate key-value elements lost when transfering them as side-inputs

Hi Beam devs,

I'm working on Euphoria DSL, where we implemented `BroadcastHashJoin` using side-inputs. But our test shows some missing data. We use `View.asMultimap()` to get our join-small-side to view in form of `PCollectionView<Map<K, Iterable<T>>>`. Then some duplicated key-value (the same key and value as some other element) gets lost. That is of course unfortunate behavior when doing joins. I believe that it all nails down to:

https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293


Where `HashMultimap` is used to gather all the elements to a `Multimap<K, V>`.  Which do not allow duplicate key-value pairs. Do you also feel this is a bug? And if yes, then we would like to fix it by replacing `HashMultimap` with `ArrayListMultimap` which allows allows duplicate key-value pairs.


We can thing of some workarounds. But we prefer to do the fix, if possible.


So what are your opinions? And how should we proceed?


Thank you.

Vaclav Plajt


Je dobré vedet, ze tento e-mail a prílohy jsou duverné. Pokud spolu jednáme o uzavrení obchodu, vyhrazujeme si právo nase jednání kdykoli ukoncit. Pro fanousky právní mluvy - vylucujeme tím ustanovení obcanského zákoníku o predsmluvní odpovednosti. Pravidla o tom, kdo u nás a jak vystupuje za spolecnost a kdo muze co a jak podepsat naleznete zde<https://onas.seznam.cz/cz/podpisovy-rad-cz.html>

You should know that this e-mail and its attachments are confidential. If we are negotiating on the conclusion of a transaction, we reserve the right to terminate the negotiations at any time. For fans of legalese-we hereby exclude the provisions of the Civil Code on pre-contractual liability. The rules about who and how may act for the company and what are the signing procedures can be found here<https://onas.seznam.cz/cz/signature-rules.html>.

Re: dulicate key-value elements lost when transfering them as side-inputs

Posted by Tim Robertson <ti...@gmail.com>.
Thanks for this Vaclav

The failing test (1 minute timeout exception) is something we see sometimes
and indicates issues in the build environment or a flakey test. I triggered
another build by leaving a comment in the PR - just fyi, this is something
you can also do in the future.







On Tue, Aug 21, 2018 at 10:57 AM Plajt, Vaclav <Va...@firma.seznam.cz>
wrote:

> Hi,
>
> looking for reviewer https://github.com/apache/beam/pull/6257
>
>
> And maybe some help with failing test in mqtt IO (timeout).
>
>
> Vaclav
> ------------------------------
> *From:* Lukasz Cwik <lc...@google.com>
> *Sent:* Monday, August 20, 2018 6:12:24 PM
> *To:* dev
> *Subject:* Re: dulicate key-value elements lost when transfering them as
> side-inputs
>
> Yes, that is a bug. I filed and assigned
> https://issues.apache.org/jira/browse/BEAM-5184 to you, feel free to
> unassign if your unable to make progress.
>
> On Mon, Aug 20, 2018 at 1:14 AM Plajt, Vaclav <
> Vaclav.Plajt@firma.seznam.cz> wrote:
>
>> Hi Beam devs,
>>
>> I'm working on Euphoria DSL, where we implemented `BroadcastHashJoin`
>> using side-inputs. But our test shows some missing data. We use `
>> View.asMultimap()` to get our join-small-side to view in form of `PCollectionView<Map<K,
>> Iterable<T>>>`. Then some duplicated key-value (the same key and value
>> as some other element) gets lost. That is of course unfortunate behavior
>> when doing joins. I believe that it all nails down to:
>>
>>
>> https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293
>>
>>
>> Where `HashMultimap` is used to gather all the elements to a `Multimap<K,
>> V>`.  Which do not allow duplicate key-value pairs. Do you also feel
>> this is a bug? And if yes, then we would like to fix it by replacing `
>> HashMultimap` with `ArrayListMultimap` which allows allows duplicate
>> key-value pairs.
>>
>>
>> We can thing of some workarounds. But we prefer to do the fix, if
>> possible.
>>
>>
>> So what are your opinions? And how should we proceed?
>>
>>
>> Thank you.
>>
>> Vaclav Plajt
>>
>>
>> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
>> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
>> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení občanského
>> zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u nás a jak
>> vystupuje za společnost a kdo může co a jak podepsat naleznete zde
>> <https://onas.seznam.cz/cz/podpisovy-rad-cz.html>
>>
>> You should know that this e-mail and its attachments are confidential. If
>> we are negotiating on the conclusion of a transaction, we reserve the right
>> to terminate the negotiations at any time. For fans of legalese—we hereby
>> exclude the provisions of the Civil Code on pre-contractual liability. The
>> rules about who and how may act for the company and what are the signing
>> procedures can be found here
>> <https://onas.seznam.cz/cz/signature-rules.html>.
>>
>

Re: dulicate key-value elements lost when transfering them as side-inputs

Posted by "Plajt, Vaclav" <Va...@firma.seznam.cz>.
Hi,

looking for reviewer https://github.com/apache/beam/pull/6257


And maybe some help with failing test in mqtt IO (timeout).


Vaclav

________________________________
From: Lukasz Cwik <lc...@google.com>
Sent: Monday, August 20, 2018 6:12:24 PM
To: dev
Subject: Re: dulicate key-value elements lost when transfering them as side-inputs

Yes, that is a bug. I filed and assigned https://issues.apache.org/jira/browse/BEAM-5184 to you, feel free to unassign if your unable to make progress.

On Mon, Aug 20, 2018 at 1:14 AM Plajt, Vaclav <Va...@firma.seznam.cz>> wrote:

Hi Beam devs,

I'm working on Euphoria DSL, where we implemented `BroadcastHashJoin` using side-inputs. But our test shows some missing data. We use `View.asMultimap()` to get our join-small-side to view in form of `PCollectionView<Map<K, Iterable<T>>>`. Then some duplicated key-value (the same key and value as some other element) gets lost. That is of course unfortunate behavior when doing joins. I believe that it all nails down to:

https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293


Where `HashMultimap` is used to gather all the elements to a `Multimap<K, V>`.  Which do not allow duplicate key-value pairs. Do you also feel this is a bug? And if yes, then we would like to fix it by replacing `HashMultimap` with `ArrayListMultimap` which allows allows duplicate key-value pairs.


We can thing of some workarounds. But we prefer to do the fix, if possible.


So what are your opinions? And how should we proceed?


Thank you.

Vaclav Plajt


Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení občanského zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u nás a jak vystupuje za společnost a kdo může co a jak podepsat naleznete zde<https://onas.seznam.cz/cz/podpisovy-rad-cz.html>

You should know that this e-mail and its attachments are confidential. If we are negotiating on the conclusion of a transaction, we reserve the right to terminate the negotiations at any time. For fans of legalese—we hereby exclude the provisions of the Civil Code on pre-contractual liability. The rules about who and how may act for the company and what are the signing procedures can be found here<https://onas.seznam.cz/cz/signature-rules.html>.

Re: dulicate key-value elements lost when transfering them as side-inputs

Posted by Lukasz Cwik <lc...@google.com>.
Yes, that is a bug. I filed and assigned
https://issues.apache.org/jira/browse/BEAM-5184 to you, feel free to
unassign if your unable to make progress.

On Mon, Aug 20, 2018 at 1:14 AM Plajt, Vaclav <Va...@firma.seznam.cz>
wrote:

> Hi Beam devs,
>
> I'm working on Euphoria DSL, where we implemented `BroadcastHashJoin`
> using side-inputs. But our test shows some missing data. We use `
> View.asMultimap()` to get our join-small-side to view in form of `PCollectionView<Map<K,
> Iterable<T>>>`. Then some duplicated key-value (the same key and value as
> some other element) gets lost. That is of course unfortunate behavior when
> doing joins. I believe that it all nails down to:
>
>
> https://github.com/apache/beam/blob/05fb694f265dda0254d7256e938e508fec9ba098/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L293
>
>
> Where `HashMultimap` is used to gather all the elements to a `Multimap<K,
> V>`.  Which do not allow duplicate key-value pairs. Do you also feel this
> is a bug? And if yes, then we would like to fix it by replacing `
> HashMultimap` with `ArrayListMultimap` which allows allows duplicate
> key-value pairs.
>
>
> We can thing of some workarounds. But we prefer to do the fix, if possible.
>
>
> So what are your opinions? And how should we proceed?
>
>
> Thank you.
>
> Vaclav Plajt
>
>
> Je dobré vědět, že tento e-mail a přílohy jsou důvěrné. Pokud spolu
> jednáme o uzavření obchodu, vyhrazujeme si právo naše jednání kdykoli
> ukončit. Pro fanoušky právní mluvy - vylučujeme tím ustanovení občanského
> zákoníku o předsmluvní odpovědnosti. Pravidla o tom, kdo u nás a jak
> vystupuje za společnost a kdo může co a jak podepsat naleznete zde
> <https://onas.seznam.cz/cz/podpisovy-rad-cz.html>
>
> You should know that this e-mail and its attachments are confidential. If
> we are negotiating on the conclusion of a transaction, we reserve the right
> to terminate the negotiations at any time. For fans of legalese—we hereby
> exclude the provisions of the Civil Code on pre-contractual liability. The
> rules about who and how may act for the company and what are the signing
> procedures can be found here
> <https://onas.seznam.cz/cz/signature-rules.html>.
>