You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/01 00:44:44 UTC

[GitHub] [hudi] yuzhaojing commented on a change in pull request #3363: [HUDI-2247] Filter file where length less than parquet MAGIC length

yuzhaojing commented on a change in pull request #3363:
URL: https://github.com/apache/hudi/pull/3363#discussion_r680428506



##########
File path: hudi-flink/src/main/java/org/apache/hudi/sink/partitioner/profile/WriteProfiles.java
##########
@@ -131,7 +133,7 @@ public static void clean(String path) {
         })
         // filter out crushed files
         .filter(Objects::nonNull)
-        .filter(fileStatus -> fileStatus.getLen() > 0)
+        .filter(fileStatus -> fileStatus.getLen() > MAGIC.length)
         .collect(Collectors.toList());

Review comment:
       This only filter parquet file that footer not written, log file still filter by fileSize > 0 because we can't predict problems by file size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org