You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2017/01/16 11:10:27 UTC
[jira] [Updated] (FLINK-5498) OuterJoins with non-equality
predicates compute wrong results
[ https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fabian Hueske updated FLINK-5498:
---------------------------------
Summary: OuterJoins with non-equality predicates compute wrong results (was: Fix JoinITCase and add support for filter expressions on the On clause in left/right outer joins)
> OuterJoins with non-equality predicates compute wrong results
> -------------------------------------------------------------
>
> Key: FLINK-5498
> URL: https://issues.apache.org/jira/browse/FLINK-5498
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.2.0, 1.1.4, 1.3.0
> Reporter: lincoln.lee
> Assignee: lincoln.lee
> Priority: Critical
>
> I found the expected result of a unit test case incorrect compare to that in a RDMBS,
> see flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> {code:title=JoinITCase.scala}
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
> ...
> val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
> val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
> val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
>
> val expected = "Hello world,BCD\n"
> val results = joinT.toDataSet[Row].collect()
> TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> {code}
> Then I took some time to learn about the ‘outer join’ in relational databases, the right result of above case should be(tested in SQL Server and MySQL, the results are same):
> {code}
> > select c, g from tuple3 right outer join tuple5 on a=f and b<h;
> c g
> -------------------------------- --------------------------------
> NULL Hallo
> NULL Hallo Welt
> NULL Hallo Welt wie
> NULL Hallo Welt wie gehts?
> NULL ABC
> Hello world BCD
> NULL CDE
> NULL DEF
> NULL EFG
> NULL FGH
> NULL GHI
> NULL HIJ
> NULL IJK
> NULL JKL
> NULL KLM
> {code}
> the join condition {{rightOuterJoin('a === 'd && 'b < 'h)}} is not equivalent to {{rightOuterJoin('a === 'd).where('b < 'h)}}.
> The problem is rooted in the code-generated {{JoinFunction}} (see {{DataSetJoin.translateToPlan()}}, line 188). If the join condition does not match, we must emit the outer row padded with nulls instead of returning from the function without emitting anything.
> The code-generated {{JoinFunction}} does also include equality predicates. These should be removed before generating the code, e.g., in {{DataSetJoinRule}} when generating the {{DataSetJoin}} with help of {{JoinInfo.getRemaining()}}.
> More detailed description: http://goo.gl/gK6vP3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)