You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2022/06/27 03:34:00 UTC

[jira] [Updated] (FLINK-28244) Introduce changelog file for DataFile

     [ https://issues.apache.org/jira/browse/FLINK-28244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingsong Lee updated FLINK-28244:
---------------------------------
    Description: 
When using TableStore to support stream consumption, there are two requirements.
 * Downstream gets all changelogs
 * The order of stream consumption is the order of input

For append only table, it is easy to meet both.

But for the primary key table, Its files are all sorted and de-duplicated by pk, making it impossible to meet the above expectations.

We can output another ChangelogFile when the DataFile flush, and the stream reads it directly.

We can modify DataFileMeta:
{code:java}
class DataFileMeta {
    String fileName;
    .....
    // store the name of extra files, extra files including changelog_file, primary_key_index_file, secondary_index_file, and etc...
    List<String> extraFiles;
}
{code}

  was:
When using TableStore to support stream consumption, there are two requirements.
 * Downstream gets all changelogs
 * The order of stream consumption is the order of input

For append only table, it is easy to meet both.

But for the primary key table, Its files are all sorted and de-duplicated by pk, making it impossible to meet the above expectations.

We can output another ChangelogFile when the DataFile flush, and the stream reads it directly.

We can modify DataFileMeta:
{code:java}
class DataFileMeta {
    String fileName;
    .....
    // store the suffix for extra files, extra files including changelog_file, primary_key_index_file, secondary_index_file, and etc...
    List<String> extraFiles;
}
{code}


> Introduce changelog file for DataFile
> -------------------------------------
>
>                 Key: FLINK-28244
>                 URL: https://issues.apache.org/jira/browse/FLINK-28244
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table Store
>            Reporter: Jingsong Lee
>            Priority: Major
>             Fix For: table-store-0.2.0
>
>
> When using TableStore to support stream consumption, there are two requirements.
>  * Downstream gets all changelogs
>  * The order of stream consumption is the order of input
> For append only table, it is easy to meet both.
> But for the primary key table, Its files are all sorted and de-duplicated by pk, making it impossible to meet the above expectations.
> We can output another ChangelogFile when the DataFile flush, and the stream reads it directly.
> We can modify DataFileMeta:
> {code:java}
> class DataFileMeta {
>     String fileName;
>     .....
>     // store the name of extra files, extra files including changelog_file, primary_key_index_file, secondary_index_file, and etc...
>     List<String> extraFiles;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)