You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/12 06:20:32 UTC

[GitHub] [hudi] bvaradar edited a comment on issue #2423: Performance Issues due to significant Parallel Create-Dir being issued to Azure ADLS_V2

bvaradar edited a comment on issue #2423:
URL: https://github.com/apache/hudi/issues/2423#issuecomment-758433327


   Hudi does not synchronize on partition path creation. Instead, each executor task which is about to write to a parquet file ensures the directory path exists by issuing fs.mkdirs call. Added : https://issues.apache.org/jira/browse/HUDI-1523
   
   If mkdirs is a costly API, Can you try this patch. It tradesoff mkdirs call with getFileStatus() -
   ```
   diff --git a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
   index d148b1b8..11b3cb49 100644
   --- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
   +++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
   @@ -105,7 +105,9 @@ public abstract class HoodieWriteHandle<T extends HoodieRecordPayload> extends H
      public Path makeNewPath(String partitionPath) {
        Path path = FSUtils.getPartitionPath(config.getBasePath(), partitionPath);
        try {
   -      fs.mkdirs(path); // create a new partition as needed.
   +      if (!fs.exists(path)) {
   +        fs.mkdirs(path); // create a new partition as needed.
   +      }
        } catch (IOException e) {
          throw new HoodieIOException("Failed to make dir " + path, e);
        }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org