You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/09/14 16:09:00 UTC

[jira] [Assigned] (HUDI-4812) Lazy partition listing and file groups fetching in Spark Query

     [ https://issues.apache.org/jira/browse/HUDI-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Kudinkin reassigned HUDI-4812:
-------------------------------------

    Assignee: Yuwei Xiao

> Lazy partition listing and file groups fetching in Spark Query
> --------------------------------------------------------------
>
>                 Key: HUDI-4812
>                 URL: https://issues.apache.org/jira/browse/HUDI-4812
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Yuwei Xiao
>            Assignee: Yuwei Xiao
>            Priority: Blocker
>
> In current spark query implementation, the FileIndex will refresh and load all file groups in cached in order to serve subsequent queries.
>  
> For large table with many partitions, this may introduce much overhead in initialization. Meanwhile, the query itself may come with partition filter. So the loading of file groups will be unnecessary.
>  
> So to optimize, the whole refresh logic will become lazy, where actual work will be carried out only after the partition filter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)