You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/19 19:43:34 UTC

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6830: [HUDI-2118] Skip checking corrupt log blocks for transactional write file systems

alexeykudinkin commented on code in PR #6830:
URL: https://github.com/apache/hudi/pull/6830#discussion_r999867005


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -160,12 +163,17 @@ private HoodieLogBlock readBlock() throws IOException {
       return createCorruptBlock(blockStartPos);
     }
 
-    // We may have had a crash which could have written this block partially
-    // Skip blockSize in the stream and we should either find a sync marker (start of the next
-    // block) or EOF. If we did not find either of it, then this block is a corrupted block.
-    boolean isCorrupted = isBlockCorrupted(blockSize);
-    if (isCorrupted) {
-      return createCorruptBlock(blockStartPos);
+    // Block is possibly corrupted only when write is not transactional
+    // the corrupted check could take 100's msec for larger file sizes, skipping the check for fs with transactional write
+    // see https://issues.apache.org/jira/browse/HUDI-2118
+    if (!StorageSchemes.isWriteTransactional(fs.getScheme())) {

Review Comment:
   I'd suggest we fold this check into the `isBlockCorrupted` check itself:
   
   - At the end of the day, this part of the same check
   - We limit the scope of the change to just `isBlockCorrupted` method



##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -769,6 +772,36 @@ public void testAppendAndReadOnCorruptedLog() throws IOException, URISyntaxExcep
     reader.close();
   }
 
+  @Test
+  public void testSkipCorruptedCheck() throws Exception {
+    // normal case: if the block is corrupted, we should be able to read back a corrupted block
+    Reader reader1 = createCorruptedFile("test-fileid1");
+    HoodieLogBlock block = reader1.next();
+    assertEquals(HoodieLogBlockType.CORRUPT_BLOCK, block.getBlockType(), "The read block should be a corrupt block");
+    reader1.close();
+
+    // mock case: we want to verify that isBlockCorrupted() check is skipped for transactional write fs
+    // so here created a corrupted block for a transactional write fs(which would never happen) to verify the skip logic
+    // a better way to verify it is to mock the private method isBlockCorrupted(), but it is not available in current Mockito framework
+    // and the attempt to onboard powermock with private method mocking capability has failed

Review Comment:
   We can omit the part regarding powermock, and just elaborate that we're doing this b/c we can't mock private method



##########
hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java:
##########
@@ -97,4 +104,12 @@ public static boolean isAppendSupported(String scheme) {
     }
     return Arrays.stream(StorageSchemes.values()).anyMatch(s -> s.supportsAppend() && s.scheme.equals(scheme));
   }
+
+  public static boolean isWriteTransactional(String scheme) {
+    if (!isSchemeSupported(scheme)) {
+      throw new IllegalArgumentException("Unsupported scheme :" + scheme);
+    }
+
+    return Arrays.stream(StorageSchemes.values()).anyMatch(s -> s.isWriteTransactional() && s.scheme.equals(scheme));

Review Comment:
   If we know that scheme is supported we can just do `Scheme.valueOf`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org