You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2022/06/30 06:44:00 UTC

[jira] [Closed] (FLINK-28244) Introduce changelog file for DataFile

     [ https://issues.apache.org/jira/browse/FLINK-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingsong Lee closed FLINK-28244.
--------------------------------
    Resolution: Fixed

master: 83a321fa116f9dc520d7ea936ce6103e1cb45869

> Introduce changelog file for DataFile
> -------------------------------------
>
>                 Key: FLINK-28244
>                 URL: https://issues.apache.org/jira/browse/FLINK-28244
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table Store
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: table-store-0.2.0
>
>
> When using TableStore to support stream consumption, there are two requirements.
>  * Downstream gets all changelogs
>  * The order of stream consumption is the order of input
> For append only table, it is easy to meet both.
> But for the primary key table, Its files are all sorted and de-duplicated by pk, making it impossible to meet the above expectations.
> We can output another ChangelogFile when the DataFile flush, and the stream reads it directly.
> We can modify DataFileMeta:
> {code:java}
> class DataFileMeta {
>     String fileName;
>     .....
>     // store the name of extra files, extra files including changelog_file, primary_key_index_file, secondary_index_file, and etc...
>     List<String> extraFiles;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)