You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/21 03:48:17 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #2845: [HUDI-1723] Fix path selector listing files with the same mod date

xushiyan commented on a change in pull request #2845:
URL: https://github.com/apache/hudi/pull/2845#discussion_r636615564



##########
File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
##########
@@ -219,13 +221,19 @@ public static void createBaseFile(String basePath, String partitionPath, String
 
   public static void createBaseFile(String basePath, String partitionPath, String instantTime, String fileId, long length)
       throws Exception {
+    createBaseFile(basePath, partitionPath, instantTime, fileId, length, Instant.now().toEpochMilli());
+  }
+
+  public static void createBaseFile(String basePath, String partitionPath, String instantTime, String fileId, long length, long lastModificationTimeMilli)
+      throws Exception {
     Path parentPath = Paths.get(basePath, partitionPath);
     Files.createDirectories(parentPath);
     Path baseFilePath = parentPath.resolve(baseFileName(instantTime, fileId));
     if (Files.notExists(baseFilePath)) {
       Files.createFile(baseFilePath);
     }
     new RandomAccessFile(baseFilePath.toFile(), "rw").setLength(length);
+    Files.setLastModifiedTime(baseFilePath, FileTime.fromMillis(lastModificationTimeMilli));

Review comment:
       @nsivabalan the problem comes from mod time being the same for multiple input files. I uploaded the screenshot in the JIRA ticket. Also posting here for easy illustration
   
   ![Screen Shot 2021-03-26 at 1 42 42 AM](https://user-images.githubusercontent.com/2701446/119078833-cde7d180-b9ab-11eb-9d0f-32625dc30b3c.png)
   
   DFSPathSelector reads last modification time and saves it as checkpoint, which is then used to compare with next batch of input files. It's not about files being mutated; the input files are append-only and last mod time _is_ create time. The test setup is to ensure last mod time being the same to avoid code execution causing delays when creating them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org