You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/09 04:05:52 UTC

[GitHub] [hudi] garyli1019 commented on a change in pull request #1938: [HUDI-920] Support Incremental query for MOR table

garyli1019 commented on a change in pull request #1938:
URL: https://github.com/apache/hudi/pull/1938#discussion_r554287262



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
##########
@@ -470,4 +471,45 @@ private static HoodieBaseFile refreshFileStatus(Configuration conf, HoodieBaseFi
     }
   }
 
+  /**
+   * List affected file status based on given commits.
+   * @param basePath
+   * @param commitsToCheck
+   * @param timeline
+   * @return HashMap<partitionPath, HashMap<fileName, FileStatus>>
+   * @throws IOException
+   */
+  public static HashMap<String, HashMap<String, FileStatus>> listStatusForAffectedPartitions(
+      Path basePath, List<HoodieInstant> commitsToCheck, HoodieTimeline timeline) throws IOException {
+    // Extract files touched by these commits.
+    // TODO This might need to be done in parallel like listStatus parallelism ?

Review comment:
       Are you referring to RFC-15 that not being landed yet? The current implementation of `HoodieParquetInputFormat` is listing all files of affected partitions and then do the filtering later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org