You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/03/26 02:06:00 UTC
[jira] [Created] (HUDI-3717) Avoid double-listing w/in BaseHoodieTableFileIndex
Alexey Kudinkin created HUDI-3717:
-------------------------------------
Summary: Avoid double-listing w/in BaseHoodieTableFileIndex
Key: HUDI-3717
URL: https://issues.apache.org/jira/browse/HUDI-3717
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Attachments: Screen Shot 2022-03-25 at 7.05.09 PM.png, Screen Shot 2022-03-25 at 7.05.43 PM.png
Currently in `BaseHoodieTableFileIndex::loadPartitionPathFiles` essentially does file-listing twice:
* Once when `getAllQueryPartitionPaths` is invoked
* Second time when `getFilesInPartitions` is invoked
While this will not result in double-listing of the files on FS (b/c of `FIleStatusCache`, if any), this leads however to MT being queried twice:
!Screen Shot 2022-03-25 at 7.05.09 PM.png!
!Screen Shot 2022-03-25 at 7.05.09 PM.png!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)