You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2017/01/16 11:03:26 UTC

[jira] [Commented] (FLINK-5498) Fix JoinITCase and add support for filter expressions on the On clause in left/right outer joins

    [ https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823786#comment-15823786 ] 

Fabian Hueske commented on FLINK-5498:
--------------------------------------

Thanks for reporting this issue, [~lincoln.86xy]! 
I had a look into the code and there is definitely a problem with the outer join implementation. 

I think we should split this up into two separate issues:
1) Including non-equality predicates in outer joins. This is a bug and must be fixed. I do not see a reason why non-equality predicates should not be supported for outer joins.
2) Including local predicates in outer joins. This would be a new feature and can be implemented separately from the bug fix.

I'll update this issue to cover the bug fix for outer joins with non-equality predicates.
Could you add another JIRA for adding support for local predicates in outer joins?

Also, I would like to include a fix for the non-equality predicates in the next release 1.2.0. The first release candidate was already published, so we would need to hurry to get the fix in. [~lincoln.86xy], you assigned this issue to yourself. Do you have time to work on a fix in the next days?

Thanks, Fabian

> Fix JoinITCase and add support for filter expressions on the On clause in left/right outer joins
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5498
>                 URL: https://issues.apache.org/jira/browse/FLINK-5498
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.2.0, 1.1.4, 1.3.0
>            Reporter: lincoln.lee
>            Assignee: lincoln.lee
>
> I found the expected result of a unit test case incorrect compare to that in a RDMBS, 
> see flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> {code:title=JoinITCase.scala}
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
>      ...
>      val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
>      val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
>      val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
>  
>      val expected = "Hello world,BCD\n"
>      val results = joinT.toDataSet[Row].collect()
>      TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> {code}
> Then I took some time to learn about the ‘outer join’ in relational databases, the right result of above case should be(tested in SQL Server and MySQL, the results are same):
> {code}
> > select c, g from tuple3 right outer join tuple5 on a=f and b<h;
> c                                g                               
> -------------------------------- --------------------------------
> NULL                             Hallo                           
> NULL                             Hallo Welt                      
> NULL                             Hallo Welt wie                  
> NULL                             Hallo Welt wie gehts?           
> NULL                             ABC                             
> Hello world                      BCD                             
> NULL                             CDE                             
> NULL                             DEF                             
> NULL                             EFG                             
> NULL                             FGH                             
> NULL                             GHI                             
> NULL                             HIJ                             
> NULL                             IJK                             
> NULL                             JKL                             
> NULL                             KLM   
> {code}
> the join condition “rightOuterJoin('a === 'd && 'b < 'h)” is not equivalent to “rightOuterJoin('a === 'd).where('b < 'h)”.  
> But another test case in flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> will throw a ValidationException indicating: “Invalid non-join predicate 'b < 3. For non-join predicates use Table#where.”
> {code:title=JoinITCase.scala}
> @Test(expected = classOf[ValidationException])
> def testNoJoinCondition(): Unit = {
>      …
>      val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
>      val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
>      val joinT = ds2.leftOuterJoin(ds1, 'b === 'd && 'b < 3).select('c, 'g)
> }
> {code}
> This jira aims to make clear what kind of expression is supported on the join predicate.
> More detailed description: http://goo.gl/gK6vP3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)