You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/19 08:52:58 UTC

[GitHub] [hudi] dongkelun opened a new pull request, #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

dongkelun opened a new pull request, #5633:
URL: https://github.com/apache/hudi/pull/5633

   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1131463885

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6a597b9b11a472aee506719708f2915e309d6d6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132419341

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on a diff in pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on code in PR #5633:
URL: https://github.com/apache/hudi/pull/5633#discussion_r884259014


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -605,15 +605,13 @@ private Pair<Option<String>, JavaRDD<WriteStatus>> writeToSink(JavaRDD<HoodieRec
     long totalErrorRecords = writeStatusRDD.mapToDouble(WriteStatus::getTotalErrorRecords).sum().longValue();
     long totalRecords = writeStatusRDD.mapToDouble(WriteStatus::getTotalRecords).sum().longValue();
     boolean hasErrors = totalErrorRecords > 0;
-    long hiveSyncTimeMs = 0;
-    long metaSyncTimeMs = 0;
     if (!hasErrors || cfg.commitOnErrors) {
       HashMap<String, String> checkpointCommitMetadata = new HashMap<>();
       if (checkpointStr != null) {
         checkpointCommitMetadata.put(CHECKPOINT_KEY, checkpointStr);
-      }
-      if (cfg.checkpoint != null) {
-        checkpointCommitMetadata.put(CHECKPOINT_RESET_KEY, cfg.checkpoint);
+        if (cfg.checkpoint != null) {
+          checkpointCommitMetadata.put(CHECKPOINT_RESET_KEY, cfg.checkpoint);
+        }

Review Comment:
   Just like the description I just updated above。When  the value of `deltastreamer.checkpoint.reset_key` is not null,but `deltastreamer.checkpoint.key` is null,According to the logic of the method getCheckpointToResume,Will throw this exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1256807510

   I don't think we have any bugs as such. here is the context.
   SqlSource is different from other sources since there is no concept of checkpoint. 
   
   So, lets say, we define a sql as "select * from tbl1"
   
   We do syncOnce() and ingest into hudi. 
   
   next time again, we are going to invoke the same sql only. which is "select * from tbl1". So, we don't have anything like a checkpoint to resume from previous attempt when we invoked sql. So, just for sqlSource, the checkpoint that gets serialized into commit metadata is always null. So, if we invoke syncOnce() again, hudi tries to fetch the checkpoint from commit metadata which is null and again we just invoke the sql as is. 
   
   your test had some issue. we have to generate new data to sql test. 
   
   I have enhanced the test for sqlSource to test 2 syncOnce(). 
   
   https://github.com/apache/hudi/pull/6781
   
   Let me know if it makes sense. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132392369

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6a597b9b11a472aee506719708f2915e309d6d6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764) 
   * 44f668b3c59b846c4838b1eada242430fb445d30 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132445143

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1132440625",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132422185

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1131570223

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6a597b9b11a472aee506719708f2915e309d6d6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132446920

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132440625",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1164762957

   sorry, I dont' understand why you are setting "--checkpoint earliest" w/ your spark-submit job. You should not set any checkpoint value if I am not wrong. 
   can you help me understand. "earliest/latest" is meant for auto reset for kafka sources. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1165064847

   > sorry, I dont' understand why you are setting "--checkpoint earliest" w/ your spark-submit job. You should not set any checkpoint value if I am not wrong. can you help me understand. "earliest/latest" is meant for auto reset for kafka sources.
   
   First of all, you are absolutely correct. The reason why I set the value of checkpoint is that sqlsource in version 0.9.0 cannot extract data if checkpoint is not set,There will be the following logs:
   ```java
   No new data, source checkpoint has not changed. Nothing to commit. Old checkpoint=(Optional.empty). New Checkpoint=(null) 
   ```
   So I try to set checkpoint and set a meaningless value, and then I can extract the data, but there will be this exception when I extract again.
   
   In the new version, the problem that data cannot be extracted has been solved by adding the parameter ` --allow-commit-on-no-checkpoint-change',However, if the user mistakenly sets a checkpoint that should not be set, there will still be this exception, so I think we should solve this problem and avoid this exception
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132495734

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132440625",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8774",
       "triggerID" : "c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   * c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8774) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132394215

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6a597b9b11a472aee506719708f2915e309d6d6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764) 
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1257393547

   @nsivabalan Although sqlSource does not support checkpoint, it does not disable the user to pass checkpoint parameters. If the user sets the checkpoint parameters to non null, such as' earliest ' In this way, if we call sync twice, an exception will be thrown, so we should modify the code logic to avoid such an exception


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1296305809

   not sure what does checkpoint refer to incase of sql source. Incase of kafka, it refers to offset and while polling for msgs from kafka we honor that. incase of DFS based sources, checkpoint refers to last mod time of files and so we filter based on that while polling for new data. but can you help me understand what does checkpoint mean for sql sources. bcoz, we can allow configuring checkpoint, but as of now, we are not leveraging the checkpoint while querying from sql source. So, unless we fix that, I don't see much benefit in allowing users to configure checkpoint.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1165064752

   > sorry, I dont' understand why you are setting "--checkpoint earliest" w/ your spark-submit job. You should not set any checkpoint value if I am not wrong. can you help me understand. "earliest/latest" is meant for auto reset for kafka sources.
   
   First of all, you are absolutely correct. The reason why I set the value of checkpoint is that sqlsource in version 0.9.0 cannot extract data if checkpoint is not set,There will be the following logs:
   ```java
   No new data, source checkpoint has not changed. Nothing to commit. Old checkpoint=(Optional.empty). New Checkpoint=(null) 
   ```
   So I try to set checkpoint and set a meaningless value, and then I can extract the data, but there will be this exception when I extract again.
   
   In the new version, the problem that data cannot be extracted has been solved by adding the parameter ` --allow-commit-on-no-checkpoint-change',However, if the user mistakenly sets a checkpoint that should not be set, there will still be this exception, so I think we should solve this problem and avoid this exception
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132440625

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132424009

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1131459863

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * f6a597b9b11a472aee506719708f2915e309d6d6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132422550

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132493825

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132440625",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 44f668b3c59b846c4838b1eada242430fb445d30 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771) 
   * c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1132570906

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8764",
       "triggerID" : "f6a597b9b11a472aee506719708f2915e309d6d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "triggerType" : "PUSH"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132422185",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "44f668b3c59b846c4838b1eada242430fb445d30",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8771",
       "triggerID" : "1132440625",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8774",
       "triggerID" : "c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c5e02fd468b5cc6f9e9d909620e3a7ea9274a3c2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8774) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #5633:
URL: https://github.com/apache/hudi/pull/5633#discussion_r884217847


##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java:
##########
@@ -1954,11 +1954,13 @@ public void testSqlSourceSource() throws Exception {
     String tableBasePath = dfsBasePath + "/test_sql_source_table" + testNum++;
     HoodieDeltaStreamer deltaStreamer =
         new HoodieDeltaStreamer(TestHelpers.makeConfig(
-            tableBasePath, WriteOperationType.INSERT, SqlSource.class.getName(),
+            tableBasePath, WriteOperationType.BULK_INSERT, SqlSource.class.getName(),
             Collections.emptyList(), PROPS_FILENAME_TEST_SQL_SOURCE, false,
-            false, 1000, false, null, null, "timestamp", null, true), jsc);
+            false, 1000, false, null, null, "timestamp", "earliest", true), jsc);
     deltaStreamer.sync();
     TestHelpers.assertRecordCount(SQL_SOURCE_NUM_RECORDS, tableBasePath, sqlContext);
+    deltaStreamer.sync();
+    TestHelpers.assertRecordCount(SQL_SOURCE_NUM_RECORDS * 2, tableBasePath, sqlContext);

Review Comment:
   not getting this change either.. why twice the record number?



##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java:
##########
@@ -605,15 +605,13 @@ private Pair<Option<String>, JavaRDD<WriteStatus>> writeToSink(JavaRDD<HoodieRec
     long totalErrorRecords = writeStatusRDD.mapToDouble(WriteStatus::getTotalErrorRecords).sum().longValue();
     long totalRecords = writeStatusRDD.mapToDouble(WriteStatus::getTotalRecords).sum().longValue();
     boolean hasErrors = totalErrorRecords > 0;
-    long hiveSyncTimeMs = 0;
-    long metaSyncTimeMs = 0;
     if (!hasErrors || cfg.commitOnErrors) {
       HashMap<String, String> checkpointCommitMetadata = new HashMap<>();
       if (checkpointStr != null) {
         checkpointCommitMetadata.put(CHECKPOINT_KEY, checkpointStr);
-      }
-      if (cfg.checkpoint != null) {
-        checkpointCommitMetadata.put(CHECKPOINT_RESET_KEY, cfg.checkpoint);
+        if (cfg.checkpoint != null) {
+          checkpointCommitMetadata.put(CHECKPOINT_RESET_KEY, cfg.checkpoint);
+        }

Review Comment:
   can you explain the logic behind this change? also pls update PR description to clarify the problem and the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on a diff in pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on code in PR #5633:
URL: https://github.com/apache/hudi/pull/5633#discussion_r884259544


##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java:
##########
@@ -1954,11 +1954,13 @@ public void testSqlSourceSource() throws Exception {
     String tableBasePath = dfsBasePath + "/test_sql_source_table" + testNum++;
     HoodieDeltaStreamer deltaStreamer =
         new HoodieDeltaStreamer(TestHelpers.makeConfig(
-            tableBasePath, WriteOperationType.INSERT, SqlSource.class.getName(),
+            tableBasePath, WriteOperationType.BULK_INSERT, SqlSource.class.getName(),
             Collections.emptyList(), PROPS_FILENAME_TEST_SQL_SOURCE, false,
-            false, 1000, false, null, null, "timestamp", null, true), jsc);
+            false, 1000, false, null, null, "timestamp", "earliest", true), jsc);
     deltaStreamer.sync();
     TestHelpers.assertRecordCount(SQL_SOURCE_NUM_RECORDS, tableBasePath, sqlContext);
+    deltaStreamer.sync();
+    TestHelpers.assertRecordCount(SQL_SOURCE_NUM_RECORDS * 2, tableBasePath, sqlContext);

Review Comment:
   Because the first execution will not throw exceptions, it can run successfully. To save the value of `deltastreamer.checkpoint.reset_key`.In the second run, this exception will be repeated only when the commitMetadata is not null。So we need to run it twice to verify that the exception is resolved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] dongkelun commented on pull request #5633: [HUDI-4123] Fix the exception due to SqlSource return null checkpoint

Posted by GitBox <gi...@apache.org>.
dongkelun commented on PR #5633:
URL: https://github.com/apache/hudi/pull/5633#issuecomment-1296412015

   > not sure what does checkpoint refer to incase of sql source. Incase of kafka, it refers to offset and while polling for msgs from kafka we honor that. incase of DFS based sources, checkpoint refers to last mod time of files and so we filter based on that while polling for new data. but can you help me understand what does checkpoint mean for sql sources. bcoz, we can allow configuring checkpoint, but as of now, we are not leveraging the checkpoint while querying from sql source. So, unless we fix that, I don't see much benefit in allowing users to configure checkpoint.
   
   Personally, I think it is useless to set checkpoint in `SqlSource` because it is meaningless


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org