You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "lincoln lee (Jira)" <ji...@apache.org> on 2022/10/27 13:23:00 UTC

[jira] [Updated] (FLINK-29781) ChangelogNormalize uses wrong keys after transformation by WatermarkAssignerChangelogNormalizeTransposeRule

     [ https://issues.apache.org/jira/browse/FLINK-29781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lincoln lee updated FLINK-29781:
--------------------------------
    Description: 
currently WatermarkAssignerChangelogNormalizeTransposeRule didn't remap the uniquekey indexes for its new input after plan rewrite, this may produce wrong result.

A simple case:
{code}

  @Test
  def testPushdownCalcNotAffectChangelogNormalizeKey(): Unit = {
    util.addTable("""
                    |CREATE TABLE t1 (
                    |  ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
                    |  a VARCHAR NOT NULL,
                    |  b VARCHAR NOT NULL,
                    |  WATERMARK FOR ingestion_time AS ingestion_time
                    |) WITH (
                    | 'connector' = 'values',
                    | 'readable-metadata' = 'ts:TIMESTAMP(3)'
                    |)
      """.stripMargin)
    util.addTable("""
                    |CREATE TABLE t2 (
                    |  k VARBINARY,
                    |  ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
                    |  a VARCHAR NOT NULL,
                    |  f BOOLEAN NOT NULL,
                    |  WATERMARK FOR `ingestion_time` AS `ingestion_time`,
                    |  PRIMARY KEY (`a`) NOT ENFORCED
                    |) WITH (
                    | 'connector' = 'values',
                    | 'readable-metadata' = 'ts:TIMESTAMP(3)',
                    | 'changelog-mode' = 'I,UA,D'
                    |)
      """.stripMargin)
    val sql =
      """
        |SELECT t1.a, t1.b, t2.f
        |FROM t1 INNER JOIN t2 FOR SYSTEM_TIME AS OF t1.ingestion_time
        | ON t1.a = t2.a WHERE t2.f = true
        |""".stripMargin
    util.verifyRelPlan(sql, ExplainDetail.CHANGELOG_MODE)
  }
{code}

the generated plan is incorrect for now:

optimize result: 
Calc(select=[a, b, f])
+- TemporalJoin(joinType=[InnerJoin], where=[AND(=(a, a0), __TEMPORAL_JOIN_CONDITION(ingestion_time, ingestion_time0, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY(a0), __TEMPORAL_JOIN_LEFT_KEY(a), __TEMPORAL_JOIN_RIGHT_KEY(a0)))], select=[ingestion_time, a, b, ingestion_time0, a0, f])
   :- Exchange(distribution=[hash[a]])
   :  +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
   :     +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, b])
   :        +- TableSourceScan(table=[[default_catalog, default_database, t1]], fields=[a, b, ingestion_time])
   +- Exchange(distribution=[hash[a]])
      +- Calc(select=[ingestion_time, a, f], where=[f])
         +- {color:red}ChangelogNormalize(key=[ingestion_time]){color}
            +- Exchange(distribution=[hash[a]])
               +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
                  +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, f])
                     +- TableSourceScan(table=[[default_catalog, default_database, t2, project=[a, f], metadata=[ts]]], fields=[a, f, ingestion_time])


  was:
currently WatermarkAssignerChangelogNormalizeTransposeRule didn't remap the uniquekey indexes for its new input after plan rewrite, this may produce wrong result.

A simple case:
{code}

  @Test
  def testPushdownCalcNotAffectChangelogNormalizeKey(): Unit = {
    util.addTable("""
                    |CREATE TABLE t1 (
                    |  ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
                    |  a VARCHAR NOT NULL,
                    |  b VARCHAR NOT NULL,
                    |  WATERMARK FOR ingestion_time AS ingestion_time
                    |) WITH (
                    | 'connector' = 'values',
                    | 'readable-metadata' = 'ts:TIMESTAMP(3)'
                    |)
      """.stripMargin)
    util.addTable("""
                    |CREATE TABLE t2 (
                    |  k VARBINARY,
                    |  ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
                    |  a VARCHAR NOT NULL,
                    |  f BOOLEAN NOT NULL,
                    |  WATERMARK FOR `ingestion_time` AS `ingestion_time`,
                    |  PRIMARY KEY (`a`) NOT ENFORCED
                    |) WITH (
                    | 'connector' = 'values',
                    | 'readable-metadata' = 'ts:TIMESTAMP(3)',
                    | 'changelog-mode' = 'I,UA,D'
                    |)
      """.stripMargin)
    val sql =
      """
        |SELECT t1.a, t1.b, t2.f
        |FROM t1 INNER JOIN t2 FOR SYSTEM_TIME AS OF t1.ingestion_time
        | ON t1.a = t2.a WHERE t2.f = true
        |""".stripMargin
    util.verifyRelPlan(sql, ExplainDetail.CHANGELOG_MODE)
  }
{code}

the generated plan is incorrect for now:
{code}
optimize result: 
Calc(select=[a, b, f])
+- TemporalJoin(joinType=[InnerJoin], where=[AND(=(a, a0), __TEMPORAL_JOIN_CONDITION(ingestion_time, ingestion_time0, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY(a0), __TEMPORAL_JOIN_LEFT_KEY(a), __TEMPORAL_JOIN_RIGHT_KEY(a0)))], select=[ingestion_time, a, b, ingestion_time0, a0, f])
   :- Exchange(distribution=[hash[a]])
   :  +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
   :     +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, b])
   :        +- TableSourceScan(table=[[default_catalog, default_database, t1]], fields=[a, b, ingestion_time])
   +- Exchange(distribution=[hash[a]])
      +- Calc(select=[ingestion_time, a, f], where=[f])
         +- {color:red}ChangelogNormalize(key=[ingestion_time]){color}
            +- Exchange(distribution=[hash[a]])
               +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
                  +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, f])
                     +- TableSourceScan(table=[[default_catalog, default_database, t2, project=[a, f], metadata=[ts]]], fields=[a, f, ingestion_time])
{code}



> ChangelogNormalize uses wrong keys after transformation by WatermarkAssignerChangelogNormalizeTransposeRule 
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-29781
>                 URL: https://issues.apache.org/jira/browse/FLINK-29781
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.16.0, 1.15.2
>            Reporter: lincoln lee
>            Priority: Major
>
> currently WatermarkAssignerChangelogNormalizeTransposeRule didn't remap the uniquekey indexes for its new input after plan rewrite, this may produce wrong result.
> A simple case:
> {code}
>   @Test
>   def testPushdownCalcNotAffectChangelogNormalizeKey(): Unit = {
>     util.addTable("""
>                     |CREATE TABLE t1 (
>                     |  ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
>                     |  a VARCHAR NOT NULL,
>                     |  b VARCHAR NOT NULL,
>                     |  WATERMARK FOR ingestion_time AS ingestion_time
>                     |) WITH (
>                     | 'connector' = 'values',
>                     | 'readable-metadata' = 'ts:TIMESTAMP(3)'
>                     |)
>       """.stripMargin)
>     util.addTable("""
>                     |CREATE TABLE t2 (
>                     |  k VARBINARY,
>                     |  ingestion_time TIMESTAMP(3) METADATA FROM 'ts',
>                     |  a VARCHAR NOT NULL,
>                     |  f BOOLEAN NOT NULL,
>                     |  WATERMARK FOR `ingestion_time` AS `ingestion_time`,
>                     |  PRIMARY KEY (`a`) NOT ENFORCED
>                     |) WITH (
>                     | 'connector' = 'values',
>                     | 'readable-metadata' = 'ts:TIMESTAMP(3)',
>                     | 'changelog-mode' = 'I,UA,D'
>                     |)
>       """.stripMargin)
>     val sql =
>       """
>         |SELECT t1.a, t1.b, t2.f
>         |FROM t1 INNER JOIN t2 FOR SYSTEM_TIME AS OF t1.ingestion_time
>         | ON t1.a = t2.a WHERE t2.f = true
>         |""".stripMargin
>     util.verifyRelPlan(sql, ExplainDetail.CHANGELOG_MODE)
>   }
> {code}
> the generated plan is incorrect for now:
> optimize result: 
> Calc(select=[a, b, f])
> +- TemporalJoin(joinType=[InnerJoin], where=[AND(=(a, a0), __TEMPORAL_JOIN_CONDITION(ingestion_time, ingestion_time0, __TEMPORAL_JOIN_CONDITION_PRIMARY_KEY(a0), __TEMPORAL_JOIN_LEFT_KEY(a), __TEMPORAL_JOIN_RIGHT_KEY(a0)))], select=[ingestion_time, a, b, ingestion_time0, a0, f])
>    :- Exchange(distribution=[hash[a]])
>    :  +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
>    :     +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, b])
>    :        +- TableSourceScan(table=[[default_catalog, default_database, t1]], fields=[a, b, ingestion_time])
>    +- Exchange(distribution=[hash[a]])
>       +- Calc(select=[ingestion_time, a, f], where=[f])
>          +- {color:red}ChangelogNormalize(key=[ingestion_time]){color}
>             +- Exchange(distribution=[hash[a]])
>                +- WatermarkAssigner(rowtime=[ingestion_time], watermark=[ingestion_time])
>                   +- Calc(select=[CAST(ingestion_time AS TIMESTAMP(3) *ROWTIME*) AS ingestion_time, a, f])
>                      +- TableSourceScan(table=[[default_catalog, default_database, t2, project=[a, f], metadata=[ts]]], fields=[a, f, ingestion_time])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)