You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Alexey Kudinkin (Jira)" <ji...@apache.org> on 2022/03/26 02:06:00 UTC

[jira] [Created] (HUDI-3717) Avoid double-listing w/in BaseHoodieTableFileIndex

Alexey Kudinkin created HUDI-3717:
-------------------------------------

             Summary: Avoid double-listing w/in BaseHoodieTableFileIndex
                 Key: HUDI-3717
                 URL: https://issues.apache.org/jira/browse/HUDI-3717
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin
         Attachments: Screen Shot 2022-03-25 at 7.05.09 PM.png, Screen Shot 2022-03-25 at 7.05.43 PM.png

Currently in `BaseHoodieTableFileIndex::loadPartitionPathFiles` essentially does file-listing twice: 
 * Once when `getAllQueryPartitionPaths` is invoked
 * Second time when `getFilesInPartitions` is invoked

 

While this will not result in double-listing of the files on FS (b/c of `FIleStatusCache`, if any), this leads however to MT being queried twice: 

!Screen Shot 2022-03-25 at 7.05.09 PM.png!

!Screen Shot 2022-03-25 at 7.05.09 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)