You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/18 06:17:38 UTC

[GitHub] [hudi] suryaprasanna opened a new pull request, #5341: [HUDI-3580] [UBER] Add support for compacted log blocks

suryaprasanna opened a new pull request, #5341:
URL: https://github.com/apache/hudi/pull/5341

   ## What is the purpose of the pull request
   
   *This pull request adds support for compacted log blocks and also handle various rollback scenarios when there are some valid data blocks between rollback block and their corresponding target block. With RFC-48 a new action type called logcompaction is introduced, this change helps to iterate the log blocks within the log files.*
   
   ## Brief change log
   
     - *For data and delete blocks introduce new field called isCompactedLogBlock*
     - *Change the iteration logic in AbstractHoodieLogRecordReader to support handling various multiwriter scenarios.*
     - *Fix issue when rollback blocks are farther from their target blocks*
     - *Include test case to verify the new isCompactedLogBlock check*
   
   ## Verify this pull request
   
   *This pull request change is already covered by existing tests, and also added new test case to verify the changes*.
     - *Added TestHoodieLogFormat.testAvroLogRecordReaderWithMixedInsertsCorruptsRollbackAndMergedLogBlock to verify the change.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1124287415

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149",
       "triggerID" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e7a31686083748bb0f2a36f60dbfb963fab9708",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8586",
       "triggerID" : "4e7a31686083748bb0f2a36f60dbfb963fab9708",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 58319027a8475b52f374a4934ca68e8026f6a838 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149) 
   * 4e7a31686083748bb0f2a36f60dbfb963fab9708 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8586) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r872969982


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java:
##########
@@ -210,7 +210,7 @@ protected Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String c
     header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, metaClient.getActiveTimeline().lastInstant().get().getTimestamp());
     header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
     header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
-        String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_PREVIOUS_BLOCK.ordinal()));
+        String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));

Review Comment:
   thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1124247014

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149",
       "triggerID" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e7a31686083748bb0f2a36f60dbfb963fab9708",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e7a31686083748bb0f2a36f60dbfb963fab9708",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 58319027a8475b52f374a4934ca68e8026f6a838 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149) 
   * 4e7a31686083748bb0f2a36f60dbfb963fab9708 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3580] [UBER] Add support for compacted log blocks

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1101135445

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2691cda7a183b61b9820c1d4cfe6d3570232746f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
suryaprasanna commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r871581733


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -232,7 +251,7 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
             && !HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
         )) {
           // hit a block with instant time greater than should be processed, stop processing further
-          break;
+          continue;

Review Comment:
   Thanks for catching this. It is a mistake I am removing this.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -93,35 +89,26 @@ public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSc
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
                              boolean readBlockLazily, boolean reverseReader) throws IOException {
-    this(fs, logFile, readerSchema, bufferSize, readBlockLazily, reverseReader, false,
-        HoodieRecord.RECORD_KEY_METADATA_FIELD);
+    this(fs, logFile, readerSchema, bufferSize, false, HoodieRecord.RECORD_KEY_METADATA_FIELD);
   }
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
-                             boolean readBlockLazily, boolean reverseReader, boolean enableRecordLookups,
-                             String keyField) throws IOException {
-    this(fs, logFile, readerSchema, bufferSize, readBlockLazily, reverseReader, enableRecordLookups, keyField, InternalSchema.getEmptyInternalSchema());
+                             boolean enableRecordLookups, String keyField) throws IOException {
+    this(fs, logFile, readerSchema, bufferSize, enableRecordLookups, keyField, InternalSchema.getEmptyInternalSchema());
   }
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
-                             boolean readBlockLazily, boolean reverseReader, boolean enableRecordLookups,
-                             String keyField, InternalSchema internalSchema) throws IOException {
+                             boolean enableRecordLookups, String keyField, InternalSchema internalSchema) throws IOException {
     this.hadoopConf = fs.getConf();
     // NOTE: We repackage {@code HoodieLogFile} here to make sure that the provided path
     //       is prefixed with an appropriate scheme given that we're not propagating the FS
     //       further
     this.logFile = new HoodieLogFile(FSUtils.makeQualified(fs, logFile.getPath()), logFile.getFileSize());
     this.inputStream = getFSDataInputStream(fs, this.logFile, bufferSize);
     this.readerSchema = readerSchema;
-    this.readBlockLazily = readBlockLazily;
-    this.reverseReader = reverseReader;
     this.enableRecordLookups = enableRecordLookups;
     this.keyField = keyField;
     this.internalSchema = internalSchema == null ? InternalSchema.getEmptyInternalSchema() : internalSchema;
-    if (this.reverseReader) {

Review Comment:
   Please let me know, what do you think?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java:
##########
@@ -210,7 +210,7 @@ protected Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String c
     header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, metaClient.getActiveTimeline().lastInstant().get().getTimestamp());
     header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
     header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
-        String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_PREVIOUS_BLOCK.ordinal()));
+        String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));

Review Comment:
   Yes, this is backward compatible, we are saving ordinal int value in the disk.



##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -414,7 +411,7 @@ public void testHugeLogFileWrite() throws IOException, URISyntaxException, Inter
     header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, getSimpleSchema().toString());
     byte[] dataBlockContentBytes = getDataBlock(DEFAULT_DATA_BLOCK_TYPE, records, header).getContentBytes();
     HoodieLogBlock.HoodieLogBlockContentLocation logBlockContentLoc = new HoodieLogBlock.HoodieLogBlockContentLocation(new Configuration(), null, 0, dataBlockContentBytes.length, 0);
-    HoodieDataBlock reusableDataBlock = new HoodieAvroDataBlock(null, Option.ofNullable(dataBlockContentBytes), false,
+    HoodieDataBlock reusableDataBlock = new HoodieAvroDataBlock(null, Option.ofNullable(dataBlockContentBytes),

Review Comment:
   Sure, I will add them.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -93,35 +89,26 @@ public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSc
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
                              boolean readBlockLazily, boolean reverseReader) throws IOException {
-    this(fs, logFile, readerSchema, bufferSize, readBlockLazily, reverseReader, false,
-        HoodieRecord.RECORD_KEY_METADATA_FIELD);
+    this(fs, logFile, readerSchema, bufferSize, false, HoodieRecord.RECORD_KEY_METADATA_FIELD);
   }
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
-                             boolean readBlockLazily, boolean reverseReader, boolean enableRecordLookups,
-                             String keyField) throws IOException {
-    this(fs, logFile, readerSchema, bufferSize, readBlockLazily, reverseReader, enableRecordLookups, keyField, InternalSchema.getEmptyInternalSchema());
+                             boolean enableRecordLookups, String keyField) throws IOException {
+    this(fs, logFile, readerSchema, bufferSize, enableRecordLookups, keyField, InternalSchema.getEmptyInternalSchema());
   }
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
-                             boolean readBlockLazily, boolean reverseReader, boolean enableRecordLookups,
-                             String keyField, InternalSchema internalSchema) throws IOException {
+                             boolean enableRecordLookups, String keyField, InternalSchema internalSchema) throws IOException {
     this.hadoopConf = fs.getConf();
     // NOTE: We repackage {@code HoodieLogFile} here to make sure that the provided path
     //       is prefixed with an appropriate scheme given that we're not propagating the FS
     //       further
     this.logFile = new HoodieLogFile(FSUtils.makeQualified(fs, logFile.getPath()), logFile.getFileSize());
     this.inputStream = getFSDataInputStream(fs, this.logFile, bufferSize);
     this.readerSchema = readerSchema;
-    this.readBlockLazily = readBlockLazily;
-    this.reverseReader = reverseReader;
     this.enableRecordLookups = enableRecordLookups;
     this.keyField = keyField;
     this.internalSchema = internalSchema == null ? InternalSchema.getEmptyInternalSchema() : internalSchema;
-    if (this.reverseReader) {

Review Comment:
   My understanding is that when iterating in reverse order there is an issue when we encounter corrupt block. We cannot jump across the corrupt block since we dont have the block size stored at the end for them. So, we end up ignoring all the blocks older than the corrupt block. 
   That is a reason for removing the reverseReader lookup, since it cannot be handled.
   It becomes more complicated when introducing log compaction. There we need to move the compacted blocks to a different slot. So, it is not straight forward traversal. So, removing this logic to reduce the complexity involved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r869734606


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java:
##########
@@ -210,7 +210,7 @@ protected Map<HoodieLogBlock.HeaderMetadataType, String> generateHeader(String c
     header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, metaClient.getActiveTimeline().lastInstant().get().getTimestamp());
     header.put(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME, commit);
     header.put(HoodieLogBlock.HeaderMetadataType.COMMAND_BLOCK_TYPE,
-        String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_PREVIOUS_BLOCK.ordinal()));
+        String.valueOf(HoodieCommandBlock.HoodieCommandBlockTypeEnum.ROLLBACK_BLOCK.ordinal()));

Review Comment:
   is this backwards compatible? for instance, can this read a log block which has ROLLBACK_PREVIOUS_BLOCK when written to disk ? 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -218,7 +221,45 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
           logFilePaths.stream().map(logFile -> new HoodieLogFile(new Path(logFile))).collect(Collectors.toList()),
           readerSchema, readBlocksLazily, reverseReader, bufferSize, enableRecordLookups, keyField, internalSchema);
 
+      /**
+       * Traversal of log blocks from log files can be done in two directions.
+       * 1. Forward traversal
+       * 2. Reverse traversal
+       * For example:   BaseFile, LogFile1(LogBlock11,LogBlock12,LogBlock13), LofFile2(LogBlock21,LogBlock22,LogBlock23)
+       *    Forward traversal look like,
+       *        LogBlock11, LogBlock12, LogBlock13, LogBlock21, LogBlock22, LogBlock23
+       *    If we are considering reverse traversal including log blocks,
+       *        LogBlock23, LogBlock22, LogBlock21, LogBlock13, LogBlock12, LogBlock11
+       * Here, reverse traversal also traverses blocks in reverse order of creation.
+       *
+       * 1. Forward traversal
+       *  Forward traversal is easy to do in single writer mode. Where the rollback block is right after the effected data blocks.
+       *  With multiwriter mode the blocks can be out of sync. An example scenario.
+       *  B1, B2, B3, B4, R1(B3), B5
+       *  In this case, rollback block R1 is invalidating the B3 which is not the previous block.
+       *  This becomes more complicated if we have compacted blocks, which are data blocks created using log compaction.
+       *  TODO: Include support for log compacted blocks. https://issues.apache.org/jira/browse/HUDI-3580
+       *
+       *  To solve this do traversal twice.

Review Comment:
   I assume we will employ two traversals only when in need. i.e. when minor compactions are enabled. If not, can we avoid it and fallback to old behavior ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1102884125

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2691cda7a183b61b9820c1d4cfe6d3570232746f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092) 
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1112798014

   @alexeykudinkin : can you review this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
suryaprasanna commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r871852934


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -218,7 +221,45 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
           logFilePaths.stream().map(logFile -> new HoodieLogFile(new Path(logFile))).collect(Collectors.toList()),
           readerSchema, readBlocksLazily, reverseReader, bufferSize, enableRecordLookups, keyField, internalSchema);
 
+      /**
+       * Traversal of log blocks from log files can be done in two directions.
+       * 1. Forward traversal
+       * 2. Reverse traversal
+       * For example:   BaseFile, LogFile1(LogBlock11,LogBlock12,LogBlock13), LofFile2(LogBlock21,LogBlock22,LogBlock23)
+       *    Forward traversal look like,
+       *        LogBlock11, LogBlock12, LogBlock13, LogBlock21, LogBlock22, LogBlock23
+       *    If we are considering reverse traversal including log blocks,
+       *        LogBlock23, LogBlock22, LogBlock21, LogBlock13, LogBlock12, LogBlock11
+       * Here, reverse traversal also traverses blocks in reverse order of creation.
+       *
+       * 1. Forward traversal
+       *  Forward traversal is easy to do in single writer mode. Where the rollback block is right after the effected data blocks.
+       *  With multiwriter mode the blocks can be out of sync. An example scenario.
+       *  B1, B2, B3, B4, R1(B3), B5
+       *  In this case, rollback block R1 is invalidating the B3 which is not the previous block.
+       *  This becomes more complicated if we have compacted blocks, which are data blocks created using log compaction.
+       *  TODO: Include support for log compacted blocks. https://issues.apache.org/jira/browse/HUDI-3580
+       *
+       *  To solve this do traversal twice.

Review Comment:
   Two traversals is needed to support the multiwriter scenarios where we can have rollback way away from the original block it is targeting. With minor compaction it becomes more tricky since we can have compacted blocks comprising of other compacted blocks. So, tackling the multiwriter scenarios with this PR first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
prashantwason commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r869718009


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -245,97 +286,53 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
             continue;
           }
         }
+        if (logBlock.getBlockType().equals(CORRUPT_BLOCK)) {
+          LOG.info("Found a corrupt block in " + logFile.getPath());
+          totalCorruptBlocks.incrementAndGet();
+          continue;
+        }
+
+        // Rollback blocks contain information of instants that are failed, collect them in a set..

Review Comment:
   This comments seems more relevant to where the rollback block is being handled later. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -245,97 +286,53 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
             continue;
           }
         }
+        if (logBlock.getBlockType().equals(CORRUPT_BLOCK)) {

Review Comment:
   Can we handle this too in the switch that follows? Having a common way to handle the various block types is easier to understand as per code flow.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -218,7 +221,45 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
           logFilePaths.stream().map(logFile -> new HoodieLogFile(new Path(logFile))).collect(Collectors.toList()),
           readerSchema, readBlocksLazily, reverseReader, bufferSize, enableRecordLookups, keyField, internalSchema);
 

Review Comment:
   Lets also remove the reverseReader as it is no longer supported.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -218,7 +221,45 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
           logFilePaths.stream().map(logFile -> new HoodieLogFile(new Path(logFile))).collect(Collectors.toList()),
           readerSchema, readBlocksLazily, reverseReader, bufferSize, enableRecordLookups, keyField, internalSchema);

Review Comment:
   Lets also remove the readBlocksLazily argument as it now required to be always true. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -245,97 +286,53 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
             continue;
           }
         }
+        if (logBlock.getBlockType().equals(CORRUPT_BLOCK)) {
+          LOG.info("Found a corrupt block in " + logFile.getPath());
+          totalCorruptBlocks.incrementAndGet();
+          continue;
+        }
+
+        // Rollback blocks contain information of instants that are failed, collect them in a set..
         switch (logBlock.getBlockType()) {
           case HFILE_DATA_BLOCK:
           case AVRO_DATA_BLOCK:
           case PARQUET_DATA_BLOCK:
-            LOG.info("Reading a data block from file " + logFile.getPath() + " at instant "
-                + logBlock.getLogBlockHeader().get(INSTANT_TIME));
-            if (isNewInstantBlock(logBlock) && !readBlocksLazily) {
-              // If this is an avro data block belonging to a different commit/instant,
-              // then merge the last blocks and records into the main result
-              processQueuedBlocksForInstant(currentInstantLogBlocks, scannedLogFiles.size(), keySpecOpt);
-            }
-            // store the current block
-            currentInstantLogBlocks.push(logBlock);
-            break;
           case DELETE_BLOCK:
-            LOG.info("Reading a delete block from file " + logFile.getPath());
-            if (isNewInstantBlock(logBlock) && !readBlocksLazily) {
-              // If this is a delete data block belonging to a different commit/instant,
-              // then merge the last blocks and records into the main result
-              processQueuedBlocksForInstant(currentInstantLogBlocks, scannedLogFiles.size(), keySpecOpt);
-            }
-            // store deletes so can be rolled back
-            currentInstantLogBlocks.push(logBlock);
+            dataAndDeleteBlocks.add(logBlock);
             break;
           case COMMAND_BLOCK:
-            // Consider the following scenario
-            // (Time 0, C1, Task T1) -> Running
-            // (Time 1, C1, Task T1) -> Failed (Wrote either a corrupt block or a correct
-            // DataBlock (B1) with commitTime C1
-            // (Time 2, C1, Task T1.2) -> Running (Task T1 was retried and the attempt number is 2)
-            // (Time 3, C1, Task T1.2) -> Finished (Wrote a correct DataBlock B2)
-            // Now a logFile L1 can have 2 correct Datablocks (B1 and B2) which are the same.
-            // Say, commit C1 eventually failed and a rollback is triggered.
-            // Rollback will write only 1 rollback block (R1) since it assumes one block is
-            // written per ingestion batch for a file but in reality we need to rollback (B1 & B2)
-            // The following code ensures the same rollback block (R1) is used to rollback
-            // both B1 & B2
             LOG.info("Reading a command block from file " + logFile.getPath());
             // This is a command block - take appropriate action based on the command
             HoodieCommandBlock commandBlock = (HoodieCommandBlock) logBlock;
-            String targetInstantForCommandBlock =
-                logBlock.getLogBlockHeader().get(HoodieLogBlock.HeaderMetadataType.TARGET_INSTANT_TIME);
-            switch (commandBlock.getType()) { // there can be different types of command blocks
-              case ROLLBACK_PREVIOUS_BLOCK:
-                // Rollback the last read log block
-                // Get commit time from last record block, compare with targetCommitTime,
-                // rollback only if equal, this is required in scenarios of invalid/extra
-                // rollback blocks written due to failures during the rollback operation itself
-                // and ensures the same rollback block (R1) is used to rollback both B1 & B2 with
-                // same instant_time
-                int numBlocksRolledBack = 0;
-                totalRollbacks.incrementAndGet();
-                while (!currentInstantLogBlocks.isEmpty()) {
-                  HoodieLogBlock lastBlock = currentInstantLogBlocks.peek();
-                  // handle corrupt blocks separately since they may not have metadata
-                  if (lastBlock.getBlockType() == CORRUPT_BLOCK) {
-                    LOG.info("Rolling back the last corrupted log block read in " + logFile.getPath());
-                    currentInstantLogBlocks.pop();
-                    numBlocksRolledBack++;
-                  } else if (targetInstantForCommandBlock.contentEquals(lastBlock.getLogBlockHeader().get(INSTANT_TIME))) {
-                    // rollback last data block or delete block
-                    LOG.info("Rolling back the last log block read in " + logFile.getPath());
-                    currentInstantLogBlocks.pop();
-                    numBlocksRolledBack++;
-                  } else if (!targetInstantForCommandBlock
-                      .contentEquals(currentInstantLogBlocks.peek().getLogBlockHeader().get(INSTANT_TIME))) {
-                    // invalid or extra rollback block
-                    LOG.warn("TargetInstantTime " + targetInstantForCommandBlock
-                        + " invalid or extra rollback command block in " + logFile.getPath());
-                    break;
-                  } else {
-                    // this should not happen ideally
-                    LOG.warn("Unable to apply rollback command block in " + logFile.getPath());
-                  }
-                }
-                LOG.info("Number of applied rollback blocks " + numBlocksRolledBack);
-                break;
-              default:
-                throw new UnsupportedOperationException("Command type not yet supported.");
+            if (commandBlock.getType().equals(ROLLBACK_BLOCK)) {
+              totalRollbacks.incrementAndGet();
+              String targetInstantForCommandBlock =
+                  logBlock.getLogBlockHeader().get(TARGET_INSTANT_TIME);
+              targetRollbackInstants.add(targetInstantForCommandBlock);
+            } else {
+              throw new UnsupportedOperationException("Command type not yet supported.");
             }
             break;
-          case CORRUPT_BLOCK:
-            LOG.info("Found a corrupt block in " + logFile.getPath());
-            totalCorruptBlocks.incrementAndGet();
-            // If there is a corrupt block - we will assume that this was the next data block
-            currentInstantLogBlocks.push(logBlock);
-            break;
           default:
-            throw new UnsupportedOperationException("Block type not supported yet");
+            throw new UnsupportedOperationException("Block type not yet supported.");
         }
       }
+
+      int numBlocksRolledBack = 0;
+      // This is a reverse traversal on the collected data blocks.

Review Comment:
   collected data and delete blocks.
   
   How is this reverse traversal? Isnt the for-loop a forward traversal?



##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -839,20 +839,24 @@ public void testAvroLogRecordReaderWithRollbackTombstone(ExternalSpillableMap.Di
     writer.appendBlock(dataBlock);
 
     // Write 2
+    header = new HashMap<>();

Review Comment:
   header.clear() also works instead of allocating a new hashmap each time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
prashantwason commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r869716469


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -218,7 +221,45 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
           logFilePaths.stream().map(logFile -> new HoodieLogFile(new Path(logFile))).collect(Collectors.toList()),
           readerSchema, readBlocksLazily, reverseReader, bufferSize, enableRecordLookups, keyField, internalSchema);
 
+      /**
+       * Traversal of log blocks from log files can be done in two directions.

Review Comment:
   Please simplify this comment and provide only the current logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
suryaprasanna commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1125474050

   > hey Surya, thanks for the patch. Wondering, for single writer scenario, do we think we can retain old behavior. only for multi-writer and minor log compactions, we might have to take the new route.
   
   Does that mean we need to pass multiwriter enabled flag to AbstractHoodieLogRecordReader and using that flag toggle the logic between one traversal and two traversals?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3580] [UBER] Add support for compacted log blocks

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1101183925

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2691cda7a183b61b9820c1d4cfe6d3570232746f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna closed pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
suryaprasanna closed pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader
URL: https://github.com/apache/hudi/pull/5341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1102887111

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2691cda7a183b61b9820c1d4cfe6d3570232746f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092) 
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 8cdf174009f9ccd8ce9c9c28026de4f12bced78f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1103183021

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149",
       "triggerID" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 8cdf174009f9ccd8ce9c9c28026de4f12bced78f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143) 
   * 58319027a8475b52f374a4934ca68e8026f6a838 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3580] [UBER] Add support for compacted log blocks

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1101137860

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2691cda7a183b61b9820c1d4cfe6d3570232746f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1124298361

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149",
       "triggerID" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e7a31686083748bb0f2a36f60dbfb963fab9708",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8586",
       "triggerID" : "4e7a31686083748bb0f2a36f60dbfb963fab9708",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 4e7a31686083748bb0f2a36f60dbfb963fab9708 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8586) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
suryaprasanna commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1124247187

   @prashantwason Addressed the review comments. Removed readBlocksLazily and reverseReader flags from AbstractHoodieLogRecordReader and log file reader classes.
   Although, I have not removed following configs COMPACTION_LAZY_BLOCK_READ_ENABLE, COMPACTION_REVERSE_LOG_READ_ENABLE from HoodieWriteConfig object.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5341:
URL: https://github.com/apache/hudi/pull/5341#discussion_r870823725


##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java:
##########
@@ -232,7 +251,7 @@ protected synchronized void scanInternal(Option<KeySpec> keySpecOpt) {
             && !HoodieTimeline.compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), HoodieTimeline.LESSER_THAN_OR_EQUALS, this.latestInstantTime
         )) {
           // hit a block with instant time greater than should be processed, stop processing further
-          break;
+          continue;

Review Comment:
   why continue. blocks greater than latest known instant time can be skipped altogether right? 



##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -414,7 +411,7 @@ public void testHugeLogFileWrite() throws IOException, URISyntaxException, Inter
     header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, getSimpleSchema().toString());
     byte[] dataBlockContentBytes = getDataBlock(DEFAULT_DATA_BLOCK_TYPE, records, header).getContentBytes();
     HoodieLogBlock.HoodieLogBlockContentLocation logBlockContentLoc = new HoodieLogBlock.HoodieLogBlockContentLocation(new Configuration(), null, 0, dataBlockContentBytes.length, 0);
-    HoodieDataBlock reusableDataBlock = new HoodieAvroDataBlock(null, Option.ofNullable(dataBlockContentBytes), false,
+    HoodieDataBlock reusableDataBlock = new HoodieAvroDataBlock(null, Option.ofNullable(dataBlockContentBytes),

Review Comment:
   do we have tests for test out the multi-writer scenario. i.e rollback log block is appended after few other valid log blocks? if not, can we add one. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java:
##########
@@ -93,35 +89,26 @@ public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSc
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
                              boolean readBlockLazily, boolean reverseReader) throws IOException {
-    this(fs, logFile, readerSchema, bufferSize, readBlockLazily, reverseReader, false,
-        HoodieRecord.RECORD_KEY_METADATA_FIELD);
+    this(fs, logFile, readerSchema, bufferSize, false, HoodieRecord.RECORD_KEY_METADATA_FIELD);
   }
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
-                             boolean readBlockLazily, boolean reverseReader, boolean enableRecordLookups,
-                             String keyField) throws IOException {
-    this(fs, logFile, readerSchema, bufferSize, readBlockLazily, reverseReader, enableRecordLookups, keyField, InternalSchema.getEmptyInternalSchema());
+                             boolean enableRecordLookups, String keyField) throws IOException {
+    this(fs, logFile, readerSchema, bufferSize, enableRecordLookups, keyField, InternalSchema.getEmptyInternalSchema());
   }
 
   public HoodieLogFileReader(FileSystem fs, HoodieLogFile logFile, Schema readerSchema, int bufferSize,
-                             boolean readBlockLazily, boolean reverseReader, boolean enableRecordLookups,
-                             String keyField, InternalSchema internalSchema) throws IOException {
+                             boolean enableRecordLookups, String keyField, InternalSchema internalSchema) throws IOException {
     this.hadoopConf = fs.getConf();
     // NOTE: We repackage {@code HoodieLogFile} here to make sure that the provided path
     //       is prefixed with an appropriate scheme given that we're not propagating the FS
     //       further
     this.logFile = new HoodieLogFile(FSUtils.makeQualified(fs, logFile.getPath()), logFile.getFileSize());
     this.inputStream = getFSDataInputStream(fs, this.logFile, bufferSize);
     this.readerSchema = readerSchema;
-    this.readBlockLazily = readBlockLazily;
-    this.reverseReader = reverseReader;
     this.enableRecordLookups = enableRecordLookups;
     this.keyField = keyField;
     this.internalSchema = internalSchema == null ? InternalSchema.getEmptyInternalSchema() : internalSchema;
-    if (this.reverseReader) {

Review Comment:
   why we are removing the reverse reader ? can you help me understand



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1254376340

   Closing in favor of https://github.com/apache/hudi/pull/5958
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1102890266

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2691cda7a183b61b9820c1d4cfe6d3570232746f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092) 
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 8cdf174009f9ccd8ce9c9c28026de4f12bced78f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1102957654

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 8cdf174009f9ccd8ce9c9c28026de4f12bced78f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1103108341

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 8cdf174009f9ccd8ce9c9c28026de4f12bced78f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143) 
   * 58319027a8475b52f374a4934ca68e8026f6a838 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5341: [HUDI-3919] [UBER] Support out of order rollback blocks in AbstractHoodieLogRecordReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5341:
URL: https://github.com/apache/hudi/pull/5341#issuecomment-1103253132

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8092",
       "triggerID" : "2691cda7a183b61b9820c1d4cfe6d3570232746f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8143",
       "triggerID" : "8cdf174009f9ccd8ce9c9c28026de4f12bced78f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149",
       "triggerID" : "58319027a8475b52f374a4934ca68e8026f6a838",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e7b633cceda6fedb7ffb36b06aae0ce7d22a8bcf UNKNOWN
   * 58319027a8475b52f374a4934ca68e8026f6a838 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8149) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org