You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/03 21:53:15 UTC

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

zhedoubushishi commented on a change in pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#discussion_r569775940



##########
File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java
##########
@@ -84,6 +86,39 @@ public static String getTablePath(FileSystem fs, Path[] userProvidedPaths) throw
     throw new TableNotFoundException("Unable to find a hudi table for the user provided paths.");
   }
 
+  public static Option<String> getOnePartitionPath(FileSystem fs, Path tablePath) throws IOException {
+    // When the table is not partitioned
+    if (HoodiePartitionMetadata.hasPartitionMetadata(fs, tablePath)) {
+      return Option.of(tablePath.toString());
+    }
+    FileStatus[] statuses = fs.listStatus(tablePath);
+    for (FileStatus status : statuses) {
+      if (status.isDirectory()) {
+        if (HoodiePartitionMetadata.hasPartitionMetadata(fs, status.getPath())) {
+          return Option.of(status.getPath().toString());
+        } else {
+          Option<String> partitionPath = getOnePartitionPath(fs, status.getPath());
+          if (partitionPath.isPresent()) {
+            return partitionPath;

Review comment:
       Yea I agree it would be better to use ```HoodieTableMetadata``` to avoid ```fs.listStatus```. But what about the tables w/o metadata feature enable? Will it take super long time if it's a table with many partitions?
   
   Also ```hoodie_partition_metadata``` saves a parameter called ```partitionDepth```, could we take advantage of this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org