You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/16 21:41:47 UTC

[jira] [Commented] (FLINK-2576) Add outer joins to API and Optimizer

    [ https://issues.apache.org/jira/browse/FLINK-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791008#comment-14791008 ] 

ASF GitHub Bot commented on FLINK-2576:
---------------------------------------

GitHub user jkovacs opened a pull request:

    https://github.com/apache/flink/pull/1138

    Feature/flink 2576

    This PR implements [FLINK-2576](https://issues.apache.org/jira/browse/FLINK-2576) (Adding the outer join operator to the optimizer and Java/Scala APIs, previously part of [FLINK-2106](https://issues.apache.org/jira/browse/FLINK-2106)).
    For reference, the revious pull requests for the outer join implementation were #907 and #1052.
    
    First of all thanks for the help we received in person and on the mailing list. 
    I designed the API as per the consensus on the mailing list and tried reusing as much code from the join operator api as possible.
    
    This PR contributes the following:
    * An OuterJoinNode to the optimizer, and 3 Sort Merge OuterJoinDescriptors for each type of outer join
    * One outer join base operator
    * left/righ/fullOuterJoin() methods to the Java and Scala APIs
      * Including some updates to the join javadocs in Java/Scala APIs
    * Refactorings where necessary (mostly concerned with being able to reuse inner join operator code)
    * Specifically refactoring of the JoinOperator in the Java API:
      * Added JoinType property, identifying inner/left-/right-/full outer join
      * Removed PlanXUnwrappingJoinOperator classes, instead promoting the TupleXUnwrappingJoiners to be able to reuse the existing unwrapping logic
      * Added inner class JoinOperatorBaseBuilder to be able to transparently construct a base operator for all types of joins, as well as tuple unwrapping of left and right inputs
      * Make sure the user can't compile a default join plan for outer joins, as well as make projection joins work with outer joins (see below)
    * End to end integration tests for the outer join operator using the Java and Scala APIs in flink-tests
    
    Usage & Implementation:
    In both APIs we prohibit using the default join functionality for outer joins. The user is required
    to specify a custom join function that combines the (potentially `null`) left and right side tuples.
    In the Java API we support the projection join functionality for outer joins. (Projection joins are not yet implemented in the Scala API for inner joins, therefore no changes there.)
    Important to note is that when the user performs a projection join, the type information is lost.
    This is also the case for the inner projection join. Additionally, we explicitly "downgrade" the result type information of an outer projection join to a Tuple of `GenericTypeInfo<>(Object.class)`, in order to be able to serialize `null` values.
    A nicer way to do this would be to use an `Optional<T>` type to represent nullable tuple values, but because we can't rely on Java 8 types, nor did I want to hardcode a dependency to a 3rd party `Optional` type (e.g. from guava) into the API, we went this route, for now.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkovacs/flink feature/FLINK-2576

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1138
    
----
commit e3ea010462e0290b857c296b0ff9572332827421
Author: Johann Kovacs <me...@jkovacs.de>
Date:   2015-09-08T16:23:54Z

    [FLINK-2576] [refactor] Extract abstract superclass for join operators

commit 061e61027a070fa408c6e9a072d5a755a5dbcc0e
Author: Johann Kovacs <me...@jkovacs.de>
Date:   2015-08-25T12:16:02Z

    [FLINK-2576] [refactor] Extract common optimizer code to superclass

commit 1465aa1d38e1730cf400e1d3164400efd72dd420
Author: r-pogalz <r....@campus.tu-berlin.de>
Date:   2015-07-07T19:40:04Z

    [FLINK-2576] Add outer join base operator

commit d5ae5d74a7283512440cffda4e1675760a9d335e
Author: Johann Kovacs <me...@jkovacs.de>
Date:   2015-09-09T09:02:08Z

    [FLINK-2576] Add outer join to optimizer

commit 0a89a0dbbe8a8b8bf6c38382e02d072d169421cf
Author: Johann Kovacs <me...@jkovacs.de>
Date:   2015-09-10T15:24:29Z

    [FLINK-2576] [tests] Don't swallow exceptions during program compilation and optimization

commit 1ccca5ba74ea9da82c14ab350582ab62dbf540a3
Author: Johann Kovacs <me...@jkovacs.de>
Date:   2015-09-16T15:00:43Z

    [FLINK-2576] Add outer join operator to Java DataSet API

commit b66b1b0a42449bc1eedfd74adcea87cb52d2a09e
Author: Johann Kovacs <me...@jkovacs.de>
Date:   2015-09-16T14:56:03Z

    [FLINK-2576] Add outer join operator to Scala DataSet API

----


> Add outer joins to API and Optimizer
> ------------------------------------
>
>                 Key: FLINK-2576
>                 URL: https://issues.apache.org/jira/browse/FLINK-2576
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Java API, Optimizer, Scala API
>            Reporter: Ricky Pogalz
>            Priority: Minor
>             Fix For: pre-apache
>
>
> Add left/right/full outer join methods to the DataSet APIs (Java, Scala) and to the optimizer of Flink.
> Initially, the execution strategy should be a sort-merge outer join (FLINK-2105) but can later be extended to hash joins for left/right outer joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)