You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "lincoln.lee (JIRA)" <ji...@apache.org> on 2017/02/21 14:48:44 UTC

[jira] [Commented] (FLINK-5498) Add support for left/right outer joins with non-equality predicates (and 1+ equality predicates)

    [ https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876085#comment-15876085 ] 

lincoln.lee commented on FLINK-5498:
------------------------------------

We can simply divide join conditions into equi parts and non-equi parts. When I basically finished, I found that the implementation of this jira and Flink-5521 almost the same (except for the validation part). But if implemented separately, there will be some validation code that will eventually be deleted. So the two jiras merged into one.
I will submit a pr later.

> Add support for left/right outer joins with non-equality predicates (and 1+ equality predicates)
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5498
>                 URL: https://issues.apache.org/jira/browse/FLINK-5498
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>    Affects Versions: 1.3.0
>            Reporter: lincoln.lee
>            Assignee: lincoln.lee
>            Priority: Minor
>
> I found the expected result of a unit test case incorrect compare to that in a RDMBS, 
> see flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> {code:title=JoinITCase.scala}
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
>      ...
>      val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
>      val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
>      val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
>  
>      val expected = "Hello world,BCD\n"
>      val results = joinT.toDataSet[Row].collect()
>      TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> {code}
> Then I took some time to learn about the ‘outer join’ in relational databases, the right result of above case should be(tested in SQL Server and MySQL, the results are same):
> {code}
> > select c, g from tuple3 right outer join tuple5 on a=f and b<h;
> c                                g                               
> -------------------------------- --------------------------------
> NULL                             Hallo                           
> NULL                             Hallo Welt                      
> NULL                             Hallo Welt wie                  
> NULL                             Hallo Welt wie gehts?           
> NULL                             ABC                             
> Hello world                      BCD                             
> NULL                             CDE                             
> NULL                             DEF                             
> NULL                             EFG                             
> NULL                             FGH                             
> NULL                             GHI                             
> NULL                             HIJ                             
> NULL                             IJK                             
> NULL                             JKL                             
> NULL                             KLM   
> {code}
> the join condition {{rightOuterJoin('a === 'd && 'b < 'h)}} is not equivalent to {{rightOuterJoin('a === 'd).where('b < 'h)}}.  
> The problem is rooted in the code-generated {{JoinFunction}} (see {{DataSetJoin.translateToPlan()}}, line 188). If the join condition does not match, we must emit the outer row padded with nulls instead of returning from the function without emitting anything.
> The code-generated {{JoinFunction}} does also include equality predicates. These should be removed before generating the code, e.g., in {{DataSetJoinRule}} when generating the {{DataSetJoin}} with help of {{JoinInfo.getRemaining()}}.
> More details: https://goo.gl/ngekca



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)