You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "lincoln lee (Jira)" <ji...@apache.org> on 2022/10/27 13:23:00 UTC
[jira] [Updated] (FLINK-29781) ChangelogNormalize uses wrong keys after transformation by WatermarkAssignerChangelogNormalizeTransposeRule
[ https://issues.apache.org/jira/browse/FLINK-29781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lincoln lee updated FLINK-29781:
--------------------------------
Description:
currently WatermarkAssignerChangelogNormalizeTransposeRule didn't remap the uniquekey indexes for its new input after plan rewrite, this may produce wrong result.
A simple case:
{code}
@Test
def testPushdownCalcNotAffectChangelogNormalizeKey(): Unit = {
util.addTable("""
|CREATE TABLE t1 (
| ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
| a VARCHAR NOT NULL,
| b VARCHAR NOT NULL,
| WATERMARK FOR ingestion_time AS ingestion_time
|) WITH (
| 'connector' = 'values',
| 'readable-metadata' = 'ts:TIMESTAMP(3)'
|)
""".stripMargin)
util.addTable("""
|CREATE TABLE t2 (
| k VARBINARY,
| ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
| a VARCHAR NOT NULL,
| f BOOLEAN NOT NULL,
| WATERMARK FOR `ingestion_time` AS `ingestion_time`,
| PRIMARY KEY (`a`) NOT ENFORCED
|) WITH (
| 'connector' = 'values',
| 'readable-metadata' = 'ts:TIMESTAMP(3)',
| 'changelog-mode' = 'I,UA,D'
|)
""".stripMargin)
val sql =
"""
|SELECT t1.a, t1.b, t2.f
|FROM t1 INNER JOIN t2 FOR SYSTEM_TIME AS OF t1.ingestion_time
| ON t1.a = t2.a WHERE t2.f = true
|""".stripMargin
util.verifyRelPlan(sql, ExplainDetail.CHANGELOG_MODE)
}
{code}
the generated plan is incorrect for now:
optimize result:
Calc(select=[a, b, f])
+- TemporalJoin(joinType=[InnerJoin], where=[AND(=(a, a0), __TEMPORAL_JOIN_CONDITION(ingestion_time, ingestion_time0, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY(a0), __TEMPORAL_JOIN_LEFT_KEY(a), __TEMPORAL_JOIN_RIGHT_KEY(a0)))], select=[ingestion_time, a, b, ingestion_time0, a0, f])
:- Exchange(distribution=[hash[a]])
: +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
: +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, b])
: +- TableSourceScan(table=[[default_catalog, default_database, t1]], fields=[a, b, ingestion_time])
+- Exchange(distribution=[hash[a]])
+- Calc(select=[ingestion_time, a, f], where=[f])
+- {color:red}ChangelogNormalize(key=[ingestion_time]){color}
+- Exchange(distribution=[hash[a]])
+- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
+- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, f])
+- TableSourceScan(table=[[default_catalog, default_database, t2, project=[a, f], metadata=[ts]]], fields=[a, f, ingestion_time])
was:
currently WatermarkAssignerChangelogNormalizeTransposeRule didn't remap the uniquekey indexes for its new input after plan rewrite, this may produce wrong result.
A simple case:
{code}
@Test
def testPushdownCalcNotAffectChangelogNormalizeKey(): Unit = {
util.addTable("""
|CREATE TABLE t1 (
| ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
| a VARCHAR NOT NULL,
| b VARCHAR NOT NULL,
| WATERMARK FOR ingestion_time AS ingestion_time
|) WITH (
| 'connector' = 'values',
| 'readable-metadata' = 'ts:TIMESTAMP(3)'
|)
""".stripMargin)
util.addTable("""
|CREATE TABLE t2 (
| k VARBINARY,
| ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
| a VARCHAR NOT NULL,
| f BOOLEAN NOT NULL,
| WATERMARK FOR `ingestion_time` AS `ingestion_time`,
| PRIMARY KEY (`a`) NOT ENFORCED
|) WITH (
| 'connector' = 'values',
| 'readable-metadata' = 'ts:TIMESTAMP(3)',
| 'changelog-mode' = 'I,UA,D'
|)
""".stripMargin)
val sql =
"""
|SELECT t1.a, t1.b, t2.f
|FROM t1 INNER JOIN t2 FOR SYSTEM_TIME AS OF t1.ingestion_time
| ON t1.a = t2.a WHERE t2.f = true
|""".stripMargin
util.verifyRelPlan(sql, ExplainDetail.CHANGELOG_MODE)
}
{code}
the generated plan is incorrect for now:
{code}
optimize result:
Calc(select=[a, b, f])
+- TemporalJoin(joinType=[InnerJoin], where=[AND(=(a, a0), __TEMPORAL_JOIN_CONDITION(ingestion_time, ingestion_time0, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY(a0), __TEMPORAL_JOIN_LEFT_KEY(a), __TEMPORAL_JOIN_RIGHT_KEY(a0)))], select=[ingestion_time, a, b, ingestion_time0, a0, f])
:- Exchange(distribution=[hash[a]])
: +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
: +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, b])
: +- TableSourceScan(table=[[default_catalog, default_database, t1]], fields=[a, b, ingestion_time])
+- Exchange(distribution=[hash[a]])
+- Calc(select=[ingestion_time, a, f], where=[f])
+- {color:red}ChangelogNormalize(key=[ingestion_time]){color}
+- Exchange(distribution=[hash[a]])
+- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
+- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, f])
+- TableSourceScan(table=[[default_catalog, default_database, t2, project=[a, f], metadata=[ts]]], fields=[a, f, ingestion_time])
{code}
> ChangelogNormalize uses wrong keys after transformation by WatermarkAssignerChangelogNormalizeTransposeRule
> ------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-29781
> URL: https://issues.apache.org/jira/browse/FLINK-29781
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Affects Versions: 1.16.0, 1.15.2
> Reporter: lincoln lee
> Priority: Major
>
> currently WatermarkAssignerChangelogNormalizeTransposeRule didn't remap the uniquekey indexes for its new input after plan rewrite, this may produce wrong result.
> A simple case:
> {code}
> @Test
> def testPushdownCalcNotAffectChangelogNormalizeKey(): Unit = {
> util.addTable("""
> |CREATE TABLE t1 (
> | ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
> | a VARCHAR NOT NULL,
> | b VARCHAR NOT NULL,
> | WATERMARK FOR ingestion_time AS ingestion_time
> |) WITH (
> | 'connector' = 'values',
> | 'readable-metadata' = 'ts:TIMESTAMP(3)'
> |)
> """.stripMargin)
> util.addTable("""
> |CREATE TABLE t2 (
> | k VARBINARY,
> | ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
> | a VARCHAR NOT NULL,
> | f BOOLEAN NOT NULL,
> | WATERMARK FOR `ingestion_time` AS `ingestion_time`,
> | PRIMARY KEY (`a`) NOT ENFORCED
> |) WITH (
> | 'connector' = 'values',
> | 'readable-metadata' = 'ts:TIMESTAMP(3)',
> | 'changelog-mode' = 'I,UA,D'
> |)
> """.stripMargin)
> val sql =
> """
> |SELECT t1.a, t1.b, t2.f
> |FROM t1 INNER JOIN t2 FOR SYSTEM_TIME AS OF t1.ingestion_time
> | ON t1.a = t2.a WHERE t2.f = true
> |""".stripMargin
> util.verifyRelPlan(sql, ExplainDetail.CHANGELOG_MODE)
> }
> {code}
> the generated plan is incorrect for now:
> optimize result:
> Calc(select=[a, b, f])
> +- TemporalJoin(joinType=[InnerJoin], where=[AND(=(a, a0), __TEMPORAL_JOIN_CONDITION(ingestion_time, ingestion_time0, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY(a0), __TEMPORAL_JOIN_LEFT_KEY(a), __TEMPORAL_JOIN_RIGHT_KEY(a0)))], select=[ingestion_time, a, b, ingestion_time0, a0, f])
> :- Exchange(distribution=[hash[a]])
> : +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
> : +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, b])
> : +- TableSourceScan(table=[[default_catalog, default_database, t1]], fields=[a, b, ingestion_time])
> +- Exchange(distribution=[hash[a]])
> +- Calc(select=[ingestion_time, a, f], where=[f])
> +- {color:red}ChangelogNormalize(key=[ingestion_time]){color}
> +- Exchange(distribution=[hash[a]])
> +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
> +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, f])
> +- TableSourceScan(table=[[default_catalog, default_database, t2, project=[a, f], metadata=[ts]]], fields=[a, f, ingestion_time])
--
This message was sent by Atlassian Jira
(v8.20.10#820010)