You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/03/27 15:04:00 UTC

[jira] [Assigned] (HUDI-5611) Revisit metadata-table-based file listing calls and use batch lookup instead

     [ https://issues.apache.org/jira/browse/HUDI-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo reassigned HUDI-5611:
-------------------------------

    Assignee: Raymond Xu

> Revisit metadata-table-based file listing calls and use batch lookup instead
> ----------------------------------------------------------------------------
>
>                 Key: HUDI-5611
>                 URL: https://issues.apache.org/jira/browse/HUDI-5611
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: Ethan Guo
>            Assignee: Raymond Xu
>            Priority: Critical
>             Fix For: 0.13.1
>
>
> We discover a performance issue with savepoint when the metadata table is enabled. It is due to unnecessary scanning of the metadata table when the number of partitions is large. When the metadata table is enabled, in the savepoint operation, for each partition, the metadata table is scanned, which leads to a lot of S3 requests.  The solution is to batch the list calls of all partitions (HUDI-5485).
>  
> We need to revisit metadata-table-based file listing calls in a similar fashion and replace them with batch lookup if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)