You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "ZhuoYu Chen (Jira)" <ji...@apache.org> on 2021/12/21 04:09:00 UTC

[jira] [Created] (FLINK-25397) grouped_execution

ZhuoYu Chen created FLINK-25397:
-----------------------------------

             Summary: grouped_execution
                 Key: FLINK-25397
                 URL: https://issues.apache.org/jira/browse/FLINK-25397
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Legacy Planner, Table SQL / Planner, Table SQL / Runtime
    Affects Versions: 1.15.0
            Reporter: ZhuoYu Chen


Performing data bucketing execution: two tables (orders, orders_item), divided into buckets (bucketing) based on the same fields (orderid) and the same number of buckets. In join by order id, join and aggregation calculations can be performed independently, because the same order ids of both tables are divided into buckets with the same ids.
This has several advantages. 1. Whenever a bucket of data is computed, the memory occupied by this bucket can be released immediately, so memory consumption can be limited by controlling the number of buckets processed in parallel.
2. reduces a lot of shuffling



--
This message was sent by Atlassian Jira
(v8.20.1#820001)