You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2022/03/07 06:37:00 UTC

[jira] [Resolved] (HUDI-3069) compact improve

     [ https://issues.apache.org/jira/browse/HUDI-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Chen resolved HUDI-3069.
------------------------------

> compact improve
> ---------------
>
>                 Key: HUDI-3069
>                 URL: https://issues.apache.org/jira/browse/HUDI-3069
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: scx
>            Priority: Major
>              Labels: performance, pull-request-available
>             Fix For: 0.11.0
>
>
> I found that when the compact plan is generated, the delta log files under each filegroup are arranged in the natural order of instant time. in the majority of cases,We can think that the latest data is in the latest delta log file, so we sort it from large to small according to the instance time, which can largely avoid rewriting the data in the compact process, and then optimize the compact time.
> In addition, when reading the delta log file, we compare the data in the external spillablemap with the delta log data. If oldrecord is selected, there is no need to rewrite the data in the external spillablemap. Rewriting data will waste a lot of resources when data is spill to disk
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)