You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/20 14:25:45 UTC

[GitHub] [hive] pvargacl opened a new pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

pvargacl opened a new pull request #1893:
URL: https://github.com/apache/hive/pull/1893


   
   ### What changes were proposed in this pull request?
   Improve FileSystem usage in Hive::loadPartitionInternal to improve performance n S3
   
   
   ### Why are the changes needed?
   Performance improvement
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Current unit tests + local performance measurements.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r563846857



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
##########
@@ -117,6 +117,10 @@ public static long createTestFileId(
     }
     return result;
   }
+  public static List<Path> listPath(final FileSystem fs, final Path path, final PathFilter filter,
+      final boolean recursive) throws IOException {
+    return listLocatedFileStatus(fs, path, filter, recursive).stream().map(FileStatus::getPath).collect(Collectors.toList());

Review comment:
       formatting: put map/collect on new lines




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561040097



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       It was used in the original insertOverWrite patch, but later it was removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561009721



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
     }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+    private final Path basePath;
+
+    public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+      super(writeId, stmtId);
+      this.basePath = basePath;
+    }
+    @Override
+    public boolean accept(Path path) {
+      do {
+        if (super.accept(path)) {
+          return true;
+        }
+        path = path.getParent();
+      } while (!path.equals(basePath));

Review comment:
       Should we have null check for path here so we won't get an infinite loop?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561076023



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
     }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+    private final Path basePath;
+
+    public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+      super(writeId, stmtId);
+      this.basePath = basePath;
+    }
+    @Override
+    public boolean accept(Path path) {
+      do {
+        if (super.accept(path)) {
+          return true;
+        }
+        path = path.getParent();
+      } while (!path.equals(basePath));

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561046305



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
     }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+    private final Path basePath;
+
+    public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+      super(writeId, stmtId);
+      this.basePath = basePath;
+    }
+    @Override
+    public boolean accept(Path path) {
+      do {
+        if (super.accept(path)) {
+          return true;
+        }
+        path = path.getParent();
+      } while (!path.equals(basePath));

Review comment:
       It will fail with a nullpointer if you call it with an unrelated path, I will add the nullcheck




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r563533532



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
##########
@@ -117,6 +117,10 @@ public static long createTestFileId(
     }
     return result;
   }
+  public static List<Path> listPath(final FileSystem fs, final Path path, final PathFilter filter,

Review comment:
       Thanks @steveloughran for taking a look at this. I will try to take advantage of the async prefetch feature in later PRs, it seems promising, but it would need bigger code change. I will checkout the IOStats downstream, I have seen it is already available there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561009721



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
     }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+    private final Path basePath;
+
+    public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+      super(writeId, stmtId);
+      this.basePath = basePath;
+    }
+    @Override
+    public boolean accept(Path path) {
+      do {
+        if (super.accept(path)) {
+          return true;
+        }
+        path = path.getParent();
+      } while (!path.equals(basePath));

Review comment:
       Is it possible that `path.equals(basePath)` won't ever be true? or path becomes null?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561023487



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       insertOverwrite flag was never used? :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r563533532



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
##########
@@ -117,6 +117,10 @@ public static long createTestFileId(
     }
     return result;
   }
+  public static List<Path> listPath(final FileSystem fs, final Path path, final PathFilter filter,

Review comment:
       Thanks @steveloughran for taking a look at this. I will try to take advantage of the async prefetch feature in later PRs, it seems promising, but it would need bigger code change. I will checkout the IOStats downstream, I have seen it is already available there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] steveloughran commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
steveloughran commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r562749017



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
##########
@@ -117,6 +117,10 @@ public static long createTestFileId(
     }
     return result;
   }
+  public static List<Path> listPath(final FileSystem fs, final Path path, final PathFilter filter,

Review comment:
       1. if you could process the results with any computation per record, you will get full benefits of the async page fetch offered by s3a and (soon) abfs; at 600ms a list for 200 records on s3, that's potentially 3ms/record saving
   2. If `listLocatedFileStatus()` logged @ debug the toString() value of the iterator, the S3A FS iterator will print out its IOStats, including #of S3 list requests and  min/mean/max durations. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       Can we return List'<'Path'>' to avoid extra map(FileStatus::getPath) in the method usages?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       Can we return List<String> to avoid extra map(FileStatus::getPath) in the method usages?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ merged pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ merged pull request #1893:
URL: https://github.com/apache/hive/pull/1893


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r563846857



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
##########
@@ -117,6 +117,10 @@ public static long createTestFileId(
     }
     return result;
   }
+  public static List<Path> listPath(final FileSystem fs, final Path path, final PathFilter filter,
+      final boolean recursive) throws IOException {
+    return listLocatedFileStatus(fs, path, filter, recursive).stream().map(FileStatus::getPath).collect(Collectors.toList());

Review comment:
       formatting: put map/collect on new lines




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       Can we return List'<'Path'>' to avoid extra map(FileStatus::getPath) and collect in the method usages?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561026453



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       Can we return List'<'String'>' to avoid extra map(FileStatus::getPath) in the method usages?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561077658



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       gotcha!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] deniskuzZ commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
deniskuzZ commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561009721



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -2671,6 +2671,28 @@ public boolean accept(Path path) {
     }
   }
 
+  /**
+   * Full recursive PathFilter version of IdPathFilter (filtering files for a given writeId and stmtId).
+   * This can be used by recursive filelisting, when we want to match the delta / base pattern on the bucketFiles.
+   */
+  public static class IdFullPathFiler extends IdPathFilter {
+    private final Path basePath;
+
+    public IdFullPathFiler(long writeId, int stmtId, Path basePath) {
+      super(writeId, stmtId);
+      this.basePath = basePath;
+    }
+    @Override
+    public boolean accept(Path path) {
+      do {
+        if (super.accept(path)) {
+          return true;
+        }
+        path = path.getParent();
+      } while (!path.equals(basePath));

Review comment:
       Is it possible that `path.equals(basePath)` won't ever be true?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #1893: HIVE-24669: Improve FileSystem usage in Hive::loadPartitionInternal

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #1893:
URL: https://github.com/apache/hive/pull/1893#discussion_r561044735



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -2620,18 +2622,18 @@ public static void listFilesInsideAcidDirectory(Path acidDir, FileSystem srcFs,
     }
   }
 
-  private void listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId, boolean isInsertOverwrite,
-      List<Path> newFiles) throws HiveException {
+  private List<FileStatus> listFilesCreatedByQuery(Path loadPath, long writeId, int stmtId) throws HiveException {

Review comment:
       No, we need the filestatuses, so we can reuse them later for populating the quickstats. In a follow up patch I will try to do the other way around and collect FileStatus everywhere in loadPartition instead of Path, because writeNotification could also benefit having the filestatuses ready. But it is not that trivial change in the non direct insert cases.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org