You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "lincoln.lee (JIRA)" <ji...@apache.org> on 2017/01/16 06:56:26 UTC
[jira] [Updated] (FLINK-5498) Fix JoinITCase and add support for
filter expressions on the On clause in left/right outer joins
[ https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lincoln.lee updated FLINK-5498:
-------------------------------
Description:
I found the expected result of a unit test case incorrect compare to that in a RDMBS,
see flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
def testRightJoinWithNotOnlyEquiJoin(): Unit = {
...
val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
val expected = "Hello world,BCD\n"
val results = joinT.toDataSet[Row].collect()
TestBaseUtils.compareResultAsText(results.asJava, expected)
}
Then I took some time to learn about the ‘outer join’ in relational databases, the right result of above case should be(tested in SQL Server and MySQL, the results are same):
> select c, g from tuple3 right outer join tuple5 on a=f and b<h;
c g
------------- ----------------
NULL Hallo
Hi Hallo Welt
NULL Hallo Welt wie
NULL Hallo Welt wie gehts?
NULL ABC
NULL BCD
NULL CDE
NULL DEF
NULL EFG
NULL FGH
NULL GHI
NULL HIJ
NULL IJK
NULL JKL
NULL KLM
the join condition “rightOuterJoin('a === 'd && 'b < 'h)” is not equivalent to “rightOuterJoin('a === 'd).where('b < 'h)”.
But another test case test case in flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
will throw a ValidationException indicating: “Invalid non-join predicate 'b < 3. For non-join predicates use Table#where.”
@Test(expected = classOf[ValidationException])
def testNoJoinCondition(): Unit = {
…
val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
val joinT = ds2.leftOuterJoin(ds1, 'b === 'd && 'b < 3).select('c, 'g)
}
This jira aims to make clear what kind of expression is supported on the join predicate.
More detailed description: http://goo.gl/gK6vP3
> Fix JoinITCase and add support for filter expressions on the On clause in left/right outer joins
> ------------------------------------------------------------------------------------------------
>
> Key: FLINK-5498
> URL: https://issues.apache.org/jira/browse/FLINK-5498
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.1.4
> Reporter: lincoln.lee
> Assignee: lincoln.lee
>
> I found the expected result of a unit test case incorrect compare to that in a RDMBS,
> see flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
> ...
> val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
> val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
> val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g)
>
> val expected = "Hello world,BCD\n"
> val results = joinT.toDataSet[Row].collect()
> TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> Then I took some time to learn about the ‘outer join’ in relational databases, the right result of above case should be(tested in SQL Server and MySQL, the results are same):
> > select c, g from tuple3 right outer join tuple5 on a=f and b<h;
> c g
> ------------- ----------------
> NULL Hallo
> Hi Hallo Welt
> NULL Hallo Welt wie
> NULL Hallo Welt wie gehts?
> NULL ABC
> NULL BCD
> NULL CDE
> NULL DEF
> NULL EFG
> NULL FGH
> NULL GHI
> NULL HIJ
> NULL IJK
> NULL JKL
> NULL KLM
> the join condition “rightOuterJoin('a === 'd && 'b < 'h)” is not equivalent to “rightOuterJoin('a === 'd).where('b < 'h)”.
> But another test case test case in flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> will throw a ValidationException indicating: “Invalid non-join predicate 'b < 3. For non-join predicates use Table#where.”
> @Test(expected = classOf[ValidationException])
> def testNoJoinCondition(): Unit = {
> …
> val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
> val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g, 'h)
> val joinT = ds2.leftOuterJoin(ds1, 'b === 'd && 'b < 3).select('c, 'g)
> }
> This jira aims to make clear what kind of expression is supported on the join predicate.
> More detailed description: http://goo.gl/gK6vP3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)