You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sherhomhuang (Jira)" <ji...@apache.org> on 2022/06/19 12:48:00 UTC

[jira] [Closed] (HUDI-4280) Support more parallelisms in flink when writing data to less bucket num but more than one partiton path.

     [ https://issues.apache.org/jira/browse/HUDI-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sherhomhuang closed HUDI-4280.
------------------------------
    Resolution: Fixed

It is improved in HUDI-4101

> Support more parallelisms in flink when writing data to less bucket num but more than one partiton path.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4280
>                 URL: https://issues.apache.org/jira/browse/HUDI-4280
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: flink
>            Reporter: sherhomhuang
>            Assignee: sherhomhuang
>            Priority: Major
>             Fix For: 0.12.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Support more parallelisms in flink when writing data to less bucket num but more than one partiton path.
> *Existing shortcoming:*    
>      Suppose a table is just set to be _*N*_ bucket num, but it may has a large historical data in *_M_* partition paths({_}*M >> N*{_}). When importing historical data, the speed of writing to the table will be limited , because parallelism cannot be set greater than _*N*_ for the algorithm in class {_}BucketIndexPartitioner{_}. 
> {*}Improvement{*}: 
>     Optimize the method of partitioner, to support _*M * N*_ parallelisms when importing to _*N*_ bucket num table.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)