You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "danny0405 (via GitHub)" <gi...@apache.org> on 2023/04/21 04:10:09 UTC

[GitHub] [hudi] danny0405 commented on a diff in pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

danny0405 commented on code in PR #8526:
URL: https://github.com/apache/hudi/pull/8526#discussion_r1173284388


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -152,98 +153,107 @@ private void addShutDownHook() {
   // TODO : convert content and block length to long by using ByteBuffer, raw byte [] allows
   // for max of Integer size
   private HoodieLogBlock readBlock() throws IOException {
-    int blockSize;
-    long blockStartPos = inputStream.getPos();
-    try {
-      // 1 Read the total size of the block
-      blockSize = (int) inputStream.readLong();
-    } catch (EOFException | CorruptedLogFileException e) {
-      // An exception reading any of the above indicates a corrupt block
-      // Create a corrupt block by finding the next MAGIC marker or EOF
-      return createCorruptBlock(blockStartPos);
-    }
-
-    // We may have had a crash which could have written this block partially
-    // Skip blockSize in the stream and we should either find a sync marker (start of the next
-    // block) or EOF. If we did not find either of it, then this block is a corrupted block.
-    boolean isCorrupted = isBlockCorrupted(blockSize);
-    if (isCorrupted) {
-      return createCorruptBlock(blockStartPos);
-    }
-
-    // 2. Read the version for this log format
-    HoodieLogFormat.LogFormatVersion nextBlockVersion = readVersion();
+    long blockStartPos = 0;
+    long blockSize = 0;

Review Comment:
   Guess line 169 ~ 172 is the line we wanna fix, can we just enclose this code snippet into the `try catch` block?
   It does not seem right we also include the data block construction from line 212 ~ 246 into the `try catch`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org