You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2021/10/12 06:39:00 UTC

[jira] [Commented] (FLINK-24050) Support primary keys on metadata columns

    [ https://issues.apache.org/jira/browse/FLINK-24050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427475#comment-17427475 ] 

Jark Wu commented on FLINK-24050:
---------------------------------

I have another use case need this feature:

For example, there is MySQL sharding tables {{user_01}}, {{user_02}}, ..., {{user_99}} and they all use snowflake algorithm to generate global unique ID as the PK of each table. The user would like to read the sharding tables as one table and load into an OLAP with primary key of table name and ID. A Flink SQL can be following:

{code}
CREATE TABLE mysql_users (
  table_name STRING NOT NULL METADATA,
  id BIGINT NOT NULL,
  user_name STRING,
  address STRING,
  PRIMARY KEY (table_name, id) NOT ENFORCED
) WITH (
  'connector' = 'mysql-cdc',
  'database-name' = 'mydb',
  'table-name' = 'user_.*',
  'username' = 'xxx',
  'password' = 'yyy' 
); 
{code}

> Support primary keys on metadata columns
> ----------------------------------------
>
>                 Key: FLINK-24050
>                 URL: https://issues.apache.org/jira/browse/FLINK-24050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API
>            Reporter: Ingo Bürk
>            Priority: Major
>
> Currently, primary keys are required to consist solely of physical columns. However, there might be scenarios where the actual payload/records do not contain a suitable primary key, but a unique identifier is available through metadata. In this case it would make sense to define the primary key on such a metadata column:
> {code:java}
> CREATE TABLE T (
>   uid STRING METADATA,
>   content STRING
>   PRIMARY KEY (uid) NOT ENFORCED
> ) WITH (…)
> {code}
> A simple example for this would be IMAP: there is nothing unique about any single email as a record, but each email in a specific folder on an IMAP server has a unique UID (I'm excluding some irrelevant technical details here).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)