You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/02/16 07:13:00 UTC

[jira] [Updated] (HUDI-4917) Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index

     [ https://issues.apache.org/jira/browse/HUDI-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-4917:
----------------------------
    Fix Version/s: 0.13.0
                       (was: 0.13.1)

> Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index
> -----------------------------------------------------------------------------------
>
>                 Key: HUDI-4917
>                 URL: https://issues.apache.org/jira/browse/HUDI-4917
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: index
>            Reporter: Chuang Lee
>            Assignee: Chuang Lee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> When using Bloom Index for loadColumnRangesFromFiles in the tagLocation process, the existing method is to obtain the hoodieBaseFile by requesting the Driver side. When the amount of data is large and the parallelism is high, there is a certain network performance bottleneck, resulting in very slow tagloacation.
> However, hoodieBaseFile can be obtained directly through HoodieIndexUtils.getLatestBaseFilesForAllPartitions() in loadColumnRangesFromFiles(), so it can effectively improve the performance of TagLoaction of Bloom Index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)