You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/05/17 10:32:00 UTC

[jira] [Updated] (FLINK-27626) Introduce pre-aggregated merge to table store

     [ https://issues.apache.org/jira/browse/FLINK-27626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated FLINK-27626:
-----------------------------------
    Labels: pull-request-available  (was: )

> Introduce pre-aggregated merge to table store
> ---------------------------------------------
>
>                 Key: FLINK-27626
>                 URL: https://issues.apache.org/jira/browse/FLINK-27626
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table Store
>            Reporter: Jingsong Lee
>            Priority: Major
>              Labels: pull-request-available
>
> We can introduce richer merge strategies, one of which is already introduced is PartialUpdateMergeFunction, which completes non-NULL fields when merging. We can introduce more powerful merge strategies, such as support for pre-aggregated merges.
> Usage 1:
> CREATE TABLE T (
>     pk STRING PRIMARY KEY NOT ENFOCED,
>     sum_field1 BIGINT,
>     sum_field1 BIGINT
> ) WITH (
>      'merge-engine' = 'aggregation',
>      'sum_field1.aggregate-function' = 'sum',
>      'sum_field2.aggregate-function' = 'sum'
> );
> INSERT INTO T VALUES ('pk1', 1, 1);
> INSERT INTO T VALUES ('pk1', 1, 1);
> SELECT * FROM T;
> => output 'pk1', 2, 2
> Usage 2:
> CREATE MATERIALIZED VIEW T
> with (
>     'merge-engine' = 'aggregation'
> ) AS SELECT
>     pk,
>     SUM(field1) AS sum_field1,
>     SUM(field2) AS sum_field1
> FROM source_t
> GROUP BY pk ;
> This will start a stream job to synchronize data, consume source data, and write incrementally to T. This data synchronization job has no state.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)