You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by fhueske <gi...@git.apache.org> on 2016/09/16 22:19:38 UTC

[GitHub] flink pull request #2508: [FLINK-2662] [dataSet] Translate union with multip...

GitHub user fhueske opened a pull request:

    https://github.com/apache/flink/pull/2508

    [FLINK-2662] [dataSet] Translate union with multiple output into separate unions with single output.

    Fixes FLINK-2662 by translating Union operators with two (or more) successors into two or more Union operators with a single successor.
    
    In the optimizer union operators with two (or more) successors caused problems, when these successors had different partitioning requirements and some of these successors were other Union operators. In certain situations, the UnionMerging post pass would fail because of a non-forward shipping strategy between two subsequent union operators.
    
    This fix does only adapt the program translation and does not change the API.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fhueske/flink FLINK-2662

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2508
    
----
commit 8d91e9d0074884ac430c88c4f6ad41878a8d1dff
Author: Fabian Hueske <fh...@apache.org>
Date:   2016-09-16T16:40:32Z

    [FLINK-2662] [dataSet] Translate union with multiple output into separate unions with single output.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2508: [FLINK-2662] [dataSet] Translate union with multiple outp...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2508
  
    I think you are right.
    
    +1 to merge then


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2508: [FLINK-2662] [dataSet] Translate union with multiple outp...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2508
  
    Another possibility would be to simply add a `NoOp` operator behind a union with multiple outputs with different partitioning schemes. That prevents the unions from being merged. 
    
    Would that not be better, in terms of not duplicating records (as it happens when having multiple instances of the one union operator)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2508: [FLINK-2662] [dataSet] Translate union with multiple outp...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/2508
  
    Thanks, merging then


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #2508: [FLINK-2662] [dataSet] Translate union with multip...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/2508


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #2508: [FLINK-2662] [dataSet] Translate union with multiple outp...

Posted by fhueske <gi...@git.apache.org>.
Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/2508
  
    Adding a NoOp would prevent the unions from being merged, but the output of the NoOp would still need to be replicated because it needs to be served to two different operators.
    
    if you have something like
    ```
    1 ------------\
                   >-> U_2 -> X
    2 -\          /
        >-> U_1 -<
    3 -/          \-> Y
    ```
    
    Duplicating `U_1` would would temporarily result in
    ```
    
    1 --------------\
                     >-> U_2 -> X
                    /
    2 -\-/-> U_11 -/ 
        X       
    3 -/-\-> U_12-> Y
    ```
    
    The generated plan with merged unions would be
    
    ```
    1 --\
         >->-> X
        / /
    2 -/-/--\
        /    >-> Y
    3 -/----/
    ```
    
    With adding a NoOp the plan would be:
    
    ```
    1 -----------\
                  >-> X
    2 -\         /
        >-> NO -<
    3 -/         \-> Y
    ```
    
    This plan would also duplicate record (the output of NoOp) and in addition add serialization overhead due to the additional operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---