You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "suryaprasanna (via GitHub)" <gi...@apache.org> on 2023/01/23 00:25:09 UTC

[GitHub] [hudi] suryaprasanna opened a new pull request, #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

suryaprasanna opened a new pull request, #7729:
URL: https://github.com/apache/hudi/pull/7729

   Summary:
   On metadata table deltacommmit timestamp is equivalent to main table commit's timestamp. So, if metadata sync fails it reuses the same timestamp. This change fixes the case where the log blocks are treated as valid if their corresponding rollback is already visited.
   
   Reviewers: balajee, O955 Project Hoodie Project Reviewer: Add blocking reviewers, PHID-PROJ-pxfpotkfgkanblb3detq!
   
   Reviewed By: balajee, O955 Project Hoodie Project Reviewer: Add blocking reviewers
   
   JIRA Issues: HUDI-1868
   
   Differential Revision: https://code.uberinternal.com/D8110297
   
   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1399659333

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4750735d6ecebcdcfd785764ca29dc8fe3550261 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1399686395

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4750735d6ecebcdcfd785764ca29dc8fe3550261 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545) 
   * 6b802f24fc5d0fcfd031800f4322febfd0ec2943 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna commented on a diff in pull request #7729: [HUDI-6356] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "suryaprasanna (via GitHub)" <gi...@apache.org>.
suryaprasanna commented on code in PR #7729:
URL: https://github.com/apache/hudi/pull/7729#discussion_r1226050070


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -1391,14 +1404,128 @@ public void testAvroLogRecordReaderWithDeleteAndRollback(ExternalSpillableMap.Di
         throw new UncheckedIOException(io);
       }
     });
-    assertEquals(100, readKeys.size(), "Stream collect should return 100 records, since 2nd block is rolled back");
-    assertEquals(50, newEmptyPayloads.size(), "Stream collect should return all 50 records with empty payloads");
-    List<String> firstBlockRecords =
-        copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
-            .collect(Collectors.toList());
-    Collections.sort(firstBlockRecords);
+    if (useScanv2) {
+      assertEquals(100, readKeys.size(), "Stream collect should return 100 records, since 2nd block is rolled back");
+      assertEquals(50, newEmptyPayloads.size(), "Stream collect should return all 50 records with empty payloads");
+      List<String> firstBlockRecords =
+          copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
+              .collect(Collectors.toList());
+      Collections.sort(firstBlockRecords);
+      Collections.sort(readKeys);
+      assertEquals(firstBlockRecords, readKeys, "CompositeAvroLogReader should return 150 records from 2 versions");
+    } else {
+      assertEquals(200, readKeys.size(), "Stream collect should return all 200 records, since 2nd block that is being rolled back is not next to rollback block.");
+      assertEquals(50, newEmptyPayloads.size(), "Stream collect should returns empty records, since 2nd block that is being rolled back is not next to rollback block.");
+      List<String> firstBlockRecords =
+          copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
+              .collect(Collectors.toList());
+    }
+  }
+
+  @ParameterizedTest
+  @MethodSource("testArguments")
+  public void testAvroLogRecordReaderWithCommitBeforeAndAfterRollback(ExternalSpillableMap.DiskMapType diskMapType,
+                                                           boolean isCompressionEnabled,
+                                                           boolean readBlocksLazily,
+                                                           boolean useScanv2)
+      throws IOException, URISyntaxException, InterruptedException {
+    Schema schema = HoodieAvroUtils.addMetadataFields(getSimpleSchema());
+    // Set a small threshold so that every block is a new version
+    String fileId = "test-fileid111";
+    Writer writer =
+        HoodieLogFormat.newWriterBuilder().onParentPath(partitionPath).withFileExtension(HoodieLogFile.DELTA_EXTENSION)
+            .withFileId(fileId).overBaseCommit("100").withFs(fs).build();
+
+    // Write 1 -> 100 records are written
+    SchemaTestUtil testUtil = new SchemaTestUtil();

Review Comment:
   Created following ticket to address the refactoring.
   https://issues.apache.org/jira/browse/HUDI-6357



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1586616060

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6b802f24fc5d0fcfd031800f4322febfd0ec2943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546) 
   * 7e87df2f2bcd6ec91089e1b4dd837c432ee3d082 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755) 
   * e700f362f2953bc814da5e87bb9aa5ba9f50ccb5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1586527975

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6b802f24fc5d0fcfd031800f4322febfd0ec2943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546) 
   * 7e87df2f2bcd6ec91089e1b4dd837c432ee3d082 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1586966677

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758",
       "triggerID" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e700f362f2953bc814da5e87bb9aa5ba9f50ccb5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1399799260

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6b802f24fc5d0fcfd031800f4322febfd0ec2943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1586622241

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758",
       "triggerID" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7e87df2f2bcd6ec91089e1b4dd837c432ee3d082 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755) 
   * e700f362f2953bc814da5e87bb9aa5ba9f50ccb5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] suryaprasanna commented on a diff in pull request #7729: [HUDI-6356] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "suryaprasanna (via GitHub)" <gi...@apache.org>.
suryaprasanna commented on code in PR #7729:
URL: https://github.com/apache/hudi/pull/7729#discussion_r1226050070


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -1391,14 +1404,128 @@ public void testAvroLogRecordReaderWithDeleteAndRollback(ExternalSpillableMap.Di
         throw new UncheckedIOException(io);
       }
     });
-    assertEquals(100, readKeys.size(), "Stream collect should return 100 records, since 2nd block is rolled back");
-    assertEquals(50, newEmptyPayloads.size(), "Stream collect should return all 50 records with empty payloads");
-    List<String> firstBlockRecords =
-        copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
-            .collect(Collectors.toList());
-    Collections.sort(firstBlockRecords);
+    if (useScanv2) {
+      assertEquals(100, readKeys.size(), "Stream collect should return 100 records, since 2nd block is rolled back");
+      assertEquals(50, newEmptyPayloads.size(), "Stream collect should return all 50 records with empty payloads");
+      List<String> firstBlockRecords =
+          copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
+              .collect(Collectors.toList());
+      Collections.sort(firstBlockRecords);
+      Collections.sort(readKeys);
+      assertEquals(firstBlockRecords, readKeys, "CompositeAvroLogReader should return 150 records from 2 versions");
+    } else {
+      assertEquals(200, readKeys.size(), "Stream collect should return all 200 records, since 2nd block that is being rolled back is not next to rollback block.");
+      assertEquals(50, newEmptyPayloads.size(), "Stream collect should returns empty records, since 2nd block that is being rolled back is not next to rollback block.");
+      List<String> firstBlockRecords =
+          copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
+              .collect(Collectors.toList());
+    }
+  }
+
+  @ParameterizedTest
+  @MethodSource("testArguments")
+  public void testAvroLogRecordReaderWithCommitBeforeAndAfterRollback(ExternalSpillableMap.DiskMapType diskMapType,
+                                                           boolean isCompressionEnabled,
+                                                           boolean readBlocksLazily,
+                                                           boolean useScanv2)
+      throws IOException, URISyntaxException, InterruptedException {
+    Schema schema = HoodieAvroUtils.addMetadataFields(getSimpleSchema());
+    // Set a small threshold so that every block is a new version
+    String fileId = "test-fileid111";
+    Writer writer =
+        HoodieLogFormat.newWriterBuilder().onParentPath(partitionPath).withFileExtension(HoodieLogFile.DELTA_EXTENSION)
+            .withFileId(fileId).overBaseCommit("100").withFs(fs).build();
+
+    // Write 1 -> 100 records are written
+    SchemaTestUtil testUtil = new SchemaTestUtil();

Review Comment:
   The class is huge, refactoring would take some time, so for now created following ticket to address the refactoring. We want this change for 0.14 release.
   https://issues.apache.org/jira/browse/HUDI-6357
   Post release we can followup on the refactoring.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1586524891

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6b802f24fc5d0fcfd031800f4322febfd0ec2943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546) 
   * 7e87df2f2bcd6ec91089e1b4dd837c432ee3d082 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1588818914

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758",
       "triggerID" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de1f8df9bdd1bc5953eee46e7947c55341ee5f14",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17794",
       "triggerID" : "de1f8df9bdd1bc5953eee46e7947c55341ee5f14",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de1f8df9bdd1bc5953eee46e7947c55341ee5f14 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17794) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1399690557

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4750735d6ecebcdcfd785764ca29dc8fe3550261 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545) 
   * 6b802f24fc5d0fcfd031800f4322febfd0ec2943 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1399683428

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4750735d6ecebcdcfd785764ca29dc8fe3550261 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7729: [UBER] Enhance rollback logic in AbstractHoodieLogRecordReader

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7729:
URL: https://github.com/apache/hudi/pull/7729#discussion_r1090150345


##########
hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java:
##########
@@ -1391,14 +1404,128 @@ public void testAvroLogRecordReaderWithDeleteAndRollback(ExternalSpillableMap.Di
         throw new UncheckedIOException(io);
       }
     });
-    assertEquals(100, readKeys.size(), "Stream collect should return 100 records, since 2nd block is rolled back");
-    assertEquals(50, newEmptyPayloads.size(), "Stream collect should return all 50 records with empty payloads");
-    List<String> firstBlockRecords =
-        copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
-            .collect(Collectors.toList());
-    Collections.sort(firstBlockRecords);
+    if (useScanv2) {
+      assertEquals(100, readKeys.size(), "Stream collect should return 100 records, since 2nd block is rolled back");
+      assertEquals(50, newEmptyPayloads.size(), "Stream collect should return all 50 records with empty payloads");
+      List<String> firstBlockRecords =
+          copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
+              .collect(Collectors.toList());
+      Collections.sort(firstBlockRecords);
+      Collections.sort(readKeys);
+      assertEquals(firstBlockRecords, readKeys, "CompositeAvroLogReader should return 150 records from 2 versions");
+    } else {
+      assertEquals(200, readKeys.size(), "Stream collect should return all 200 records, since 2nd block that is being rolled back is not next to rollback block.");
+      assertEquals(50, newEmptyPayloads.size(), "Stream collect should returns empty records, since 2nd block that is being rolled back is not next to rollback block.");
+      List<String> firstBlockRecords =
+          copyOfRecords1.stream().map(s -> ((GenericRecord) s).get(HoodieRecord.RECORD_KEY_METADATA_FIELD).toString())
+              .collect(Collectors.toList());
+    }
+  }
+
+  @ParameterizedTest
+  @MethodSource("testArguments")
+  public void testAvroLogRecordReaderWithCommitBeforeAndAfterRollback(ExternalSpillableMap.DiskMapType diskMapType,
+                                                           boolean isCompressionEnabled,
+                                                           boolean readBlocksLazily,
+                                                           boolean useScanv2)
+      throws IOException, URISyntaxException, InterruptedException {
+    Schema schema = HoodieAvroUtils.addMetadataFields(getSimpleSchema());
+    // Set a small threshold so that every block is a new version
+    String fileId = "test-fileid111";
+    Writer writer =
+        HoodieLogFormat.newWriterBuilder().onParentPath(partitionPath).withFileExtension(HoodieLogFile.DELTA_EXTENSION)
+            .withFileId(fileId).overBaseCommit("100").withFs(fs).build();
+
+    // Write 1 -> 100 records are written
+    SchemaTestUtil testUtil = new SchemaTestUtil();

Review Comment:
   I see we are duplicating the test code. Can you try and see if we can move these to private methods and reuse across tests. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1588587244

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758",
       "triggerID" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de1f8df9bdd1bc5953eee46e7947c55341ee5f14",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17794",
       "triggerID" : "de1f8df9bdd1bc5953eee46e7947c55341ee5f14",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e700f362f2953bc814da5e87bb9aa5ba9f50ccb5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758) 
   * de1f8df9bdd1bc5953eee46e7947c55341ee5f14 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17794) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan merged PR #7729:
URL: https://github.com/apache/hudi/pull/7729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7729: [HUDI-6356] Fix rollback logic on AbstractHoodieLogRecordReader when roll backed blocks are created with same instant

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7729:
URL: https://github.com/apache/hudi/pull/7729#issuecomment-1588581587

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14545",
       "triggerID" : "4750735d6ecebcdcfd785764ca29dc8fe3550261",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14546",
       "triggerID" : "6b802f24fc5d0fcfd031800f4322febfd0ec2943",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17755",
       "triggerID" : "7e87df2f2bcd6ec91089e1b4dd837c432ee3d082",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758",
       "triggerID" : "e700f362f2953bc814da5e87bb9aa5ba9f50ccb5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "de1f8df9bdd1bc5953eee46e7947c55341ee5f14",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de1f8df9bdd1bc5953eee46e7947c55341ee5f14",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * e700f362f2953bc814da5e87bb9aa5ba9f50ccb5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17758) 
   * de1f8df9bdd1bc5953eee46e7947c55341ee5f14 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org