You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Hang Ruan (Jira)" <ji...@apache.org> on 2022/05/13 08:21:00 UTC

[jira] [Commented] (FLINK-24050) Support primary keys on metadata columns

    [ https://issues.apache.org/jira/browse/FLINK-24050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536499#comment-17536499 ] 

Hang Ruan commented on FLINK-24050:
-----------------------------------

I am interested in this issue. Maybe I can help to improve this part. But there are still some details to discuss.

I think there should be no limitation for the source table. But when the table is used as a sink table, we need to discuss how to deal with the metadata primary key, which may be virtual or not.
 * Virtual metadata: Virtual metadata can not be persisted in the target storage. It leads that the same metadata which is read from the same table may be different from the value when we write it.
 * Writable metadata(Not virtual): There should be no limitation for the writable metadata.

IMO, the strategy for the metadata primary key should be like this:
 * There should be no limitation for the source table to use a metadata as primary keys;
 * For sink table:
 ** Using virtual metadata as primary keys is meaningless. If virtual metadata is used in this way, we need to ignore this primary key and warn the user. 
 *** If the primary key only contains virtual metadata, just ignore this primary key.
 *** If the primary key contains virtual metadata and other columns, throw a validation exception.
 ** Using writable metadata as primary keys is allowed. The behavior when write these metadata to the target storage depends on the connector type.
 *** Take upsert-kafka tables as an example. The upsert-kafka tables will write primary keys to the key of the Kafka record. If upsert-kafka connector supports to use metadata as primay keys, whether the metadata is write to the key or not depends on the upsert-kafka connector's implementation.

> Support primary keys on metadata columns
> ----------------------------------------
>
>                 Key: FLINK-24050
>                 URL: https://issues.apache.org/jira/browse/FLINK-24050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API
>            Reporter: Ingo Bürk
>            Priority: Major
>
> Currently, primary keys are required to consist solely of physical columns. However, there might be scenarios where the actual payload/records do not contain a suitable primary key, but a unique identifier is available through metadata. In this case it would make sense to define the primary key on such a metadata column:
> {code:java}
> CREATE TABLE T (
>   uid STRING METADATA,
>   content STRING
>   PRIMARY KEY (uid) NOT ENFORCED
> ) WITH (…)
> {code}
> A simple example for this would be IMAP: there is nothing unique about any single email as a record, but each email in a specific folder on an IMAP server has a unique UID (I'm excluding some irrelevant technical details here).
> See FLINK-24512 for another (probably better) use case.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)