You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/17 22:54:13 UTC

[GitHub] [hudi] rmahindra123 opened a new pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

rmahindra123 opened a new pull request #4021:
URL: https://github.com/apache/hudi/pull/4021


   ## What is the purpose of the pull request
   
   [HUDI-2671] When there are two sink workers running, there can be a case where one participant joins after the coordinator starts a first commit, which needs to be rolled back later since the other participant does not receive the START_COMMIT message for the transaction.  In this case, later on in a new commit, `writeRecords()` can miss records because `ongoingTransactionInfo.getLastWrittenKafkaOffset()` is behind the record offsets in the buffer.  This causes missing records in the target Hudi table.
   
   This PR fixes the issue by not accepting kafka offsets that are above the current expected value, and resets the kafka offset when such situation happens. The corresponding tests are also added.
   
   [HUDI-2672] When there is no data in kafka, the coordinator keeps starting a new commit, that leads to the existing commit to be rolled back, creating a bunch of rollback commits, which are also not archived. This PR fixes this issue by resuing existing commit when all write statuses are empty.
   
   
   ## Brief change log
   - Fixed an issue with missing records due to participant accepting kafka offsets higher than last written offset.
   - Fixed issue where coordinator keeps creating rollback commits when no data is present in kafka.
   - 
   ## Verify this pull request
   - added and improved the unit tests for participant's handling of kafka offset.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#discussion_r756185388



##########
File path: hudi-kafka-connect/src/main/java/org/apache/hudi/connect/transaction/ConnectTransactionParticipant.java
##########
@@ -215,13 +215,17 @@ private void writeRecords() {
         try {
           SinkRecord record = buffer.peek();
           if (record != null
-              && record.kafkaOffset() >= ongoingTransactionInfo.getLastWrittenKafkaOffset()) {
+              && record.kafkaOffset() == ongoingTransactionInfo.getExpectedKafkaOffset()) {

Review comment:
       This assumes that the records come in order in terms of the Kafka offset from the buffer.  Is it possible that the records may come out of order?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-978044264


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   * b793782a558dc4ba07d2e888ad54724e02958080 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-977399908


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-972220657


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#discussion_r756172606



##########
File path: hudi-kafka-connect/src/main/java/org/apache/hudi/connect/transaction/ConnectTransactionParticipant.java
##########
@@ -201,9 +201,9 @@ private void handleEndCommit(ControlMessage message) {
   }
 
   private void handleAckCommit(ControlMessage message) {
-    // Update lastKafkCommitedOffset locally.
-    if (ongoingTransactionInfo != null && committedKafkaOffset < ongoingTransactionInfo.getLastWrittenKafkaOffset()) {
-      committedKafkaOffset = ongoingTransactionInfo.getLastWrittenKafkaOffset();
+    // // Update committedKafkaOffset that tracks the last committed kafka offset locally.

Review comment:
       Nit: remove additional `//` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-978046557


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3687",
       "triggerID" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   * b793782a558dc4ba07d2e888ad54724e02958080 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3687) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-977399908


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-978083147


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3687",
       "triggerID" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b793782a558dc4ba07d2e888ad54724e02958080 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3687) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-972220657


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-978046557


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3687",
       "triggerID" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   * b793782a558dc4ba07d2e888ad54724e02958080 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3687) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-977398810


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   * 05656d60186e81d77cbe031408ded27ee4667240 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-977398810


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   * 05656d60186e81d77cbe031408ded27ee4667240 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua merged pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
yihua merged pull request #4021:
URL: https://github.com/apache/hudi/pull/4021


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rmahindra123 commented on a change in pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
rmahindra123 commented on a change in pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#discussion_r756250522



##########
File path: hudi-kafka-connect/src/main/java/org/apache/hudi/connect/transaction/ConnectTransactionParticipant.java
##########
@@ -215,13 +215,17 @@ private void writeRecords() {
         try {
           SinkRecord record = buffer.peek();
           if (record != null
-              && record.kafkaOffset() >= ongoingTransactionInfo.getLastWrittenKafkaOffset()) {
+              && record.kafkaOffset() == ongoingTransactionInfo.getExpectedKafkaOffset()) {
             ongoingTransactionInfo.getWriter().writeRecord(record);
-            ongoingTransactionInfo.setLastWrittenKafkaOffset(record.kafkaOffset() + 1);
-          } else if (record != null && record.kafkaOffset() < committedKafkaOffset) {
-            LOG.warn(String.format("Received a kafka record with offset %s prior to last committed offset %s for partition %s",
-                record.kafkaOffset(), ongoingTransactionInfo.getLastWrittenKafkaOffset(),
-                partition));
+            ongoingTransactionInfo.setExpectedKafkaOffset(record.kafkaOffset() + 1);
+          } else if (record != null && record.kafkaOffset() > ongoingTransactionInfo.getExpectedKafkaOffset()) {
+            LOG.warn(String.format("Received a kafka record with offset %s above the next expected kafka offset %s for partition %s, "
+                    + "hence resetting the kafka offset to %s",
+                record.kafkaOffset(),
+                ongoingTransactionInfo.getExpectedKafkaOffset(),
+                partition,
+                ongoingTransactionInfo.getExpectedKafkaOffset()));
+            context.offset(partition, ongoingTransactionInfo.getExpectedKafkaOffset());
           }

Review comment:
       Good point, I also added some more log messages.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-972328448


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-977453744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#discussion_r756197802



##########
File path: hudi-kafka-connect/src/main/java/org/apache/hudi/connect/transaction/ConnectTransactionParticipant.java
##########
@@ -215,13 +215,17 @@ private void writeRecords() {
         try {
           SinkRecord record = buffer.peek();
           if (record != null
-              && record.kafkaOffset() >= ongoingTransactionInfo.getLastWrittenKafkaOffset()) {
+              && record.kafkaOffset() == ongoingTransactionInfo.getExpectedKafkaOffset()) {
             ongoingTransactionInfo.getWriter().writeRecord(record);
-            ongoingTransactionInfo.setLastWrittenKafkaOffset(record.kafkaOffset() + 1);
-          } else if (record != null && record.kafkaOffset() < committedKafkaOffset) {
-            LOG.warn(String.format("Received a kafka record with offset %s prior to last committed offset %s for partition %s",
-                record.kafkaOffset(), ongoingTransactionInfo.getLastWrittenKafkaOffset(),
-                partition));
+            ongoingTransactionInfo.setExpectedKafkaOffset(record.kafkaOffset() + 1);
+          } else if (record != null && record.kafkaOffset() > ongoingTransactionInfo.getExpectedKafkaOffset()) {
+            LOG.warn(String.format("Received a kafka record with offset %s above the next expected kafka offset %s for partition %s, "
+                    + "hence resetting the kafka offset to %s",
+                record.kafkaOffset(),
+                ongoingTransactionInfo.getExpectedKafkaOffset(),
+                partition,
+                ongoingTransactionInfo.getExpectedKafkaOffset()));
+            context.offset(partition, ongoingTransactionInfo.getExpectedKafkaOffset());
           }

Review comment:
       Shall we still log a warning message if the Kafka record offset is prior to the offset?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-978044264


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b793782a558dc4ba07d2e888ad54724e02958080",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   * b793782a558dc4ba07d2e888ad54724e02958080 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-972213242


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-972213242


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-972328448


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a13e5fa2eb54b637154cefd113f6e18831353ec Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rmahindra123 commented on a change in pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
rmahindra123 commented on a change in pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#discussion_r756242828



##########
File path: hudi-kafka-connect/src/main/java/org/apache/hudi/connect/transaction/ConnectTransactionParticipant.java
##########
@@ -215,13 +215,17 @@ private void writeRecords() {
         try {
           SinkRecord record = buffer.peek();
           if (record != null
-              && record.kafkaOffset() >= ongoingTransactionInfo.getLastWrittenKafkaOffset()) {
+              && record.kafkaOffset() == ongoingTransactionInfo.getExpectedKafkaOffset()) {

Review comment:
       They will always come in order. But its possible that kafka may send them out of order in case we do not commit an offset for sometime. For instance, it will send 12,13,14,15 ... but then realize that the consumer has not committed an offset (and the last committed offset was 5), it will start sending 5,6,7 .. but henceforth it will be in order. Thats why we have an excepted offset, and if we do not receive an  expected offset, we force offset reset, and kafka will start sending messages from before. This ensures no data loss/duplication, but yeah there may be scope for some optimization, but trying to be conservative for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#issuecomment-977453744


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3446",
       "triggerID" : "8a13e5fa2eb54b637154cefd113f6e18831353ec",
       "triggerType" : "PUSH"
     }, {
       "hash" : "05656d60186e81d77cbe031408ded27ee4667240",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657",
       "triggerID" : "05656d60186e81d77cbe031408ded27ee4667240",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 05656d60186e81d77cbe031408ded27ee4667240 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3657) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4021: [HUDI-2671] Fix kafka offset handling in Kafka Connect protocol

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4021:
URL: https://github.com/apache/hudi/pull/4021#discussion_r756257484



##########
File path: hudi-kafka-connect/src/main/java/org/apache/hudi/connect/transaction/ConnectTransactionParticipant.java
##########
@@ -215,13 +215,17 @@ private void writeRecords() {
         try {
           SinkRecord record = buffer.peek();
           if (record != null
-              && record.kafkaOffset() >= ongoingTransactionInfo.getLastWrittenKafkaOffset()) {
+              && record.kafkaOffset() == ongoingTransactionInfo.getExpectedKafkaOffset()) {

Review comment:
       Got it.  Basically, the offsets of the records are sth like 12,13,14,15,5,6,7,8,9,10,11,12,13,14,15 after the resetting so this logic skips the first group of records.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org