You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/08 22:59:15 UTC

[GitHub] [hudi] prashantwason commented on a change in pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

prashantwason commented on a change in pull request #2417:
URL: https://github.com/apache/hudi/pull/2417#discussion_r554237450



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java
##########
@@ -49,12 +60,48 @@ public FileSystemBackedTableMetadata(SerializableConfiguration conf, String data
 
   @Override
   public List<String> getAllPartitionPaths() throws IOException {
-    FileSystem fs = new Path(datasetBasePath).getFileSystem(hadoopConf.get());
     if (assumeDatePartitioning) {
+      FileSystem fs = new Path(datasetBasePath).getFileSystem(hadoopConf.get());
       return FSUtils.getAllPartitionFoldersThreeLevelsDown(fs, datasetBasePath);
-    } else {
-      return FSUtils.getAllFoldersWithPartitionMetaFile(fs, datasetBasePath);
     }
+
+    List<Path> pathsToList = new LinkedList<>();
+    pathsToList.add(new Path(datasetBasePath));
+    List<String> partitionPaths = new ArrayList<>();
+
+    // TODO: Get the parallelism from HoodieWriteConfig
+    final int fileListingParallelism = 1500;

Review comment:
       This is the max. There is a Math.min below.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org