You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "kazdy (via GitHub)" <gi...@apache.org> on 2023/02/20 21:20:25 UTC

[GitHub] [hudi] kazdy opened a new pull request, #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

kazdy opened a new pull request, #7998:
URL: https://github.com/apache/hudi/pull/7998

   ### Change Logs
   
   Fix shouldCombine, take into account the situation where the write operation is UPSERT but  COMBINE_BEFORE_UPSERT is false.
   Currently, Hudi always combines records on UPSERT, and option COMBINE_BEFORE_UPSERT is not honored.
   
   ### Impact
   
   Fixes user-facing option COMBINE_BEFORE_UPSERT
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118452920


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   > Thought this was a clean way to do it.
   
   We can do some config inference here which is good way to go, but as I said, for COW table, the in-mem combining is a must, you can not break that rule.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542824483

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002) 
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546433657

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * ce89b12639ebe78146afcd2f9c95d646226f1127 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025) 
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   * dfdd33316d71f2866b3052f45b3328e30678f1a3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546501423

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044",
       "triggerID" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   * dfdd33316d71f2866b3052f45b3328e30678f1a3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1547016098

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044",
       "triggerID" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "93db1f02dc47c597f6ce7708e98ef943d50a1206",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17058",
       "triggerID" : "93db1f02dc47c597f6ce7708e98ef943d50a1206",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   * 93db1f02dc47c597f6ce7708e98ef943d50a1206 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17058) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar merged pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar merged PR #7998:
URL: https://github.com/apache/hudi/pull/7998


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546986111

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044",
       "triggerID" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "93db1f02dc47c597f6ce7708e98ef943d50a1206",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17058",
       "triggerID" : "93db1f02dc47c597f6ce7708e98ef943d50a1206",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   * dfdd33316d71f2866b3052f45b3328e30678f1a3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044) 
   * 93db1f02dc47c597f6ce7708e98ef943d50a1206 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17058) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438275242

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441751764

   There seem to be a bug with non-strict insert mode
   when using spark datasource it can insert duplicates only in overwrite mode or append mode when data is inserted to the table for the first time, but if I want to insert in append mode for the second time it deduplicates the dataset as if it was working in upsert mode.
   
   ```
   
   opt_insert = {
       'hoodie.table.name': 'huditbl',
       'hoodie.datasource.write.recordkey.field': 'keyid',
       'hoodie.datasource.write.table.name': 'huditbl',
       'hoodie.datasource.write.operation': 'insert',
       'hoodie.sql.insert.mode': 'non-strict',
       'hoodie.upsert.shuffle.parallelism': 2,
       'hoodie.insert.shuffle.parallelism': 2,
       'hoodie.combine.before.upsert': 'false',
       'hoodie.combine.before.insert': 'false',
       'hoodie.datasource.write.insert.drop.duplicates': 'false'
   }
   
   df = spark.range(0, 10).toDF("keyid") \
     .withColumn("age", expr("keyid + 1000"))
   
   df.write.format("hudi"). \
   options(**opt_insert). \
   mode("overwrite"). \
   save(path)
   
   spark.read.format("hudi").load(path).count() # returns 10
   
   df = df.union(df) # creates duplicates
   df.write.format("hudi"). \
   options(**opt_insert). \
   mode("append"). \
   save(path)
   
   spark.read.format("hudi").load(path).count() # returns 10 but should return 20
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118447028


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Does this mean that users should never set COMBINE_BEFORE_UPSERT for CoW table? I thought this config is something that users are allowed to disable deliberately. At least I was surprised it does not work as described in docs.
   
   If combine before upsert is always required for CoW, how Flink deduplicates records for no precombine tables?
   
   My thinking was:
   1. user creates table without precombine field
   2. wants to do update
   3. can't because it's getting exception about missing precombine field
   
   To allow this flow:
   1. user creates table with no precombine field
   2. hudi internally knows no precombine filed was defined and disables COMBINE_BEFORE_UPSERT
   3. Hudi does not try to precombine on upsert, it's user responsibility to deduplicate incoming dataset
   
   But for this to work, this change would need to be done first.
   
   Thought this was a clean way to do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1444290033

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * cde8d4ffa1cae261731d94c2a0117ece6473a882 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1150039506


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Yeah, we uniqueness can be assured by the upstream data source, then the user should use the insert operation instead of the upsert operation right? It is risky to loosen the restriction if there are no good knowledge for user to learn about these details.
   
   We can loosen the restriction if we have a good way to tell user there is risky for data duplications.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1164539647


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&

Review Comment:
   Wouldn't the conditions here cause shouldCombine to be true even when
   
   operation = UPSERT
   COMBINE_BEFORE_UPSERT = false
   COMBINE_BEFORE_INSERT = true 
   
   ? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1437597791

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1442053212

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c164a3991b8bd900b802fa8de8e85ccb54f6cb98 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349) 
   * e8e3240aff997075065eb01d9277b227ab2bdf73 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438471304

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438813585

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de68cc51637a324a4e711b74ad52092bb569fb52 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542832656

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002) 
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a95196cb0c749c1e1e8fb245a2a58d429159d519 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438838951

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de68cc51637a324a4e711b74ad52092bb569fb52 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315) 
   * 13fafcd633926980f7e01117c8039138f14fa3f5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441098868

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 13fafcd633926980f7e01117c8039138f14fa3f5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318) 
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441127487

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 13fafcd633926980f7e01117c8039138f14fa3f5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318) 
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c164a3991b8bd900b802fa8de8e85ccb54f6cb98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1472983917

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546755590

   @kazdy : Can you remove the draft status if you think it is ready. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1152767014


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   @danny0405 and myself synced on this. We think there is a valid case. For example : A setup where there is an upstream hudi table A and  Hudi Table B derives from Table A. A job runs every night, scans all records from table A (which is guaranteed to be unique) and upserts to Table B (upsert is needed because Table B is not a log table. It is like everyday snapshot for Table A). In this case, we need to upsert but allow taking advantage of Table A uniqueness to avoid pre-combining. 



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -960,6 +960,86 @@ class TestHoodieSparkSqlWriter {
     assert(spark.read.format("hudi").load(tempBasePath).where("age >= 2000").count() == 10)
   }
 
+  /**
+   * Test upsert for CoW table without precombine field and combine before upsert disabled.
+   */
+  @Test
+  def testUpsertWithoutPrecombineFieldAndCombineBeforeUpsertDisabled(): Unit = {
+    val options = Map(DataSourceWriteOptions.TABLE_TYPE.key -> HoodieTableType.COPY_ON_WRITE.name(),

Review Comment:
   @kazdy : Can you also cover MOR case. Even for MOR, we should let upsert skip pre-combine if the user expects input batch to be unique and wants to skip this step. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1544643609

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a95196cb0c749c1e1e8fb245a2a58d429159d519 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003) 
   * ce89b12639ebe78146afcd2f9c95d646226f1127 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546429197

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * ce89b12639ebe78146afcd2f9c95d646226f1127 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025) 
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118428935


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Actually only MOR table can allow lossening the restriction for COMBINE_BEFORE_UPSERT, for COW table, the combining of in-mem dataset is a precondition for the uniqueness of deduplication, Flink does that in this line:
   https://github.com/apache/hudi/blob/58e9ce710b405e7653236989912eae49559cd011/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java#L382



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118447028


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Does this mean that users should never set COMBINE_BEFORE_UPSERT for CoW table? I thought this config is something that users are allowed to disable deliberately.
   
   If combine before upsert is always required for CoW, how Flink deduplicates records for no precombine tables?
   
   My thinking was:
   1. user creates table without precombine field
   2. wants to do update
   3. can't because it's getting exception about missing precombine field
   
   To allow this flow:
   1. user creates table with no precombine field
   2. hudi internally knows no precombine filed was defined and disables COMBINE_BEFORE_UPSERT
   3. Hudi does not try to precombine on upsert, it's user responsibility to deduplicate incoming dataset
   
   But for this to work, this change would need to be done first.
   
   Thought this was a clean way to do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1472861080

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * cde8d4ffa1cae261731d94c2a0117ece6473a882 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389) 
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1437681694

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1113530490


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   This also makes spark hudi compatible with flink hudi, if COMBINE_BEFORE_UPSERT=false, users can upsert when no preCombine field is not defined. Flink has COMBINE_BEFORE_UPSERT set to false as default. 
   That's why it worked in flink but not in spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438889874

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * de68cc51637a324a4e711b74ad52092bb569fb52 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315) 
   * 13fafcd633926980f7e01117c8039138f14fa3f5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1443746633

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * e8e3240aff997075065eb01d9277b227ab2bdf73 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365) 
   * cde8d4ffa1cae261731d94c2a0117ece6473a882 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438816953

   GH actions tests failed but I don't see why, it passed before.
   Azure failed on TestHoodieDeltaStreamerWithMultiWriter, rather unrelated and it also passed it on the first run. Looks like some flaky tests causing issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438549551

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310) 
   * de68cc51637a324a4e711b74ad52092bb569fb52 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1437603038

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1442291366

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * e8e3240aff997075065eb01d9277b227ab2bdf73 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1149946901


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   For case where the source is a hudi table or some online table (with uniqueness of rows guaranteed), the source batch is guaranteed to be de-duped. In this case, isn't it safe to allow users to disable pre-combining independent of whether this is COR or MOR. @danny0405 : Let me know if I am missing something ? Thanks 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546465206

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044",
       "triggerID" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * ce89b12639ebe78146afcd2f9c95d646226f1127 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025) 
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   * dfdd33316d71f2866b3052f45b3328e30678f1a3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118447028


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Does this mean that users should never set COMBINE_BEFORE_UPSERT for CoW table? I thought this config is something that users are allowed to disable deliberately.
   
   If combine before upsert is always required for CoW, how Flink deduplicates records for no precombine tables?
   
   My thinking was:
   1. user creates table without precombine field
   2. wants to do update
   3. can't because it's getting exception about missing precombine field
   
   To allow this flow:
   1. user creates table with no precombine field
   2. hudi internally knows no precombine filed was defined and disables COMBINE_BEFORE_UPSERT
   3. Hudi does not try to precombine on upsert, it's user responsibility to deduplicate incoming dataset
   
   Thought this was a clean way to do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118447028


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Does this mean that users should never set COMBINE_BEFORE_UPSERT for CoW table? I thought this config is something that users are allowed to disable deliberately. At least I was surprised it does not work as described in docs.
   
   If combine before upsert is always required for CoW, how Flink deduplicates records for no precombine tables? Because if that's the case Hudi with Spark behaves differently.
   
   My thinking was:
   1. user creates table without precombine field
   2. wants to do update
   3. can't because it's getting exception about missing precombine field
   
   To allow this flow:
   1. user creates table with no precombine field
   2. hudi internally knows no precombine filed was defined and disables COMBINE_BEFORE_UPSERT
   3. Hudi does not try to precombine on upsert, it's user responsibility to deduplicate incoming dataset
   
   But for this to work, this change would need to be done first.
   
   Thought this was a clean way to do it because error is thrown from here:
   https://github.com/apache/hudi/blob/cde8d4ffa1cae261731d94c2a0117ece6473a882/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L1096-L1098
   
   after condition on shouldCombine val



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118429803


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -960,6 +960,87 @@ class TestHoodieSparkSqlWriter {
     assert(spark.read.format("hudi").load(tempBasePath).where("age >= 2000").count() == 10)
   }
 
+  /**
+   * Test upsert for CoW table without precombine field and combine before upsert disabled.
+   */
+  @Test
+  def testUpsertWithoutPrecombineFieldAndCombineBeforeUpsertDisabled(): Unit = {
+    val options = Map(DataSourceWriteOptions.TABLE_TYPE.key -> HoodieTableType.COPY_ON_WRITE.name(),
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "keyid",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.NonpartitionedKeyGenerator",
+      HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+      HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key -> "false",
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1"
+    )
+
+    val df = spark.range(0, 10).toDF("keyid")
+      .withColumn("age", expr("keyid + 1000"))
+    df.write.format("hudi")
+      .options(options.updated(DataSourceWriteOptions.OPERATION.key, "insert"))
+      .mode(SaveMode.Overwrite).save(tempBasePath)
+
+    // upsert same records again, should work
+    val df_update = spark.range(0, 10).toDF("keyid")
+      .withColumn("age", expr("keyid + 2000"))
+    df_update.write.format("hudi")
+      .options(options.updated(DataSourceWriteOptions.OPERATION.key, "upsert"))
+      .mode(SaveMode.Append).save(tempBasePath)
+    val df_result_1 = spark.read.format("hudi").load(tempBasePath).selectExpr("keyid", "age")
+    assert(df_result_1.count() == 10)
+    assert(df_result_1.where("age >= 2000").count() == 10)
+
+    // insert duplicated rows (overwrite because of bug, non-strict mode does not work with append)
+    val df_with_duplicates = df.union(df)
+    df_with_duplicates.write.format("hudi")
+      .options(options.updated(DataSourceWriteOptions.OPERATION.key, "insert"))
+      .mode(SaveMode.Overwrite).save(tempBasePath)
+    val df_result_2 = spark.read.format("hudi").load(tempBasePath).selectExpr("keyid", "age")
+    assert(df_result_2.count() == 20)
+    assert(df_result_2.distinct().count() == 10)
+    assert(df_result_2.where("age >= 1000 and age < 2000").count() == 20)
+
+    // upsert with duplicates, should update but not deduplicate
+    val df_with_duplicates_update = df_with_duplicates.withColumn("age", expr("keyid + 3000"))
+    df_with_duplicates_update.write.format("hudi")
+      .options(options.updated(DataSourceWriteOptions.OPERATION.key, "upsert"))
+      .mode(SaveMode.Append).save(tempBasePath)
+    val df_result_3 = spark.read.format("hudi").load(tempBasePath).selectExpr("keyid", "age")
+    assert(df_result_3.distinct().count() == 10)
+    assert(df_result_3.count() == 20)
+    assert(df_result_3.where("age >= 3000").count() == 20)
+  }
+
+  /**
+   * Test upsert for CoW table with combine before upsert disabled.
+   */

Review Comment:
   The comment says `CoW` which is inconsistency with the actual tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542839511

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002) 
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a95196cb0c749c1e1e8fb245a2a58d429159d519 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546330086

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * ce89b12639ebe78146afcd2f9c95d646226f1127 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025) 
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1545726848

   @bvaradar CI is green, could you please take a look at it again? thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1193042902


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -956,6 +956,87 @@ class TestHoodieSparkSqlWriter {
     assert(df_result.where("age >= 2000").count() == 10)
   }
 
+  /**
+   * Test upsert for CoW table without precombine field and combine before upsert disabled.
+   */
+  @Test
+  def testUpsertWithoutPrecombineFieldAndCombineBeforeUpsertDisabled(): Unit = {
+    val options = Map(DataSourceWriteOptions.TABLE_TYPE.key -> HoodieTableType.COPY_ON_WRITE.name(),
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "keyid",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.NonpartitionedKeyGenerator",
+      HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+      HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key -> "false",
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1"
+    )
+
+    val df = spark.range(0, 10).toDF("keyid")
+      .withColumn("age", expr("keyid + 1000"))
+    df.write.format("hudi")
+      .options(options.updated(DataSourceWriteOptions.OPERATION.key, "insert"))
+      .mode(SaveMode.Overwrite).save(tempBasePath)
+
+    // upsert same records again, should work
+    val df_update = spark.range(0, 10).toDF("keyid")
+      .withColumn("age", expr("keyid + 2000"))
+    df_update.write.format("hudi")
+      .options(options.updated(DataSourceWriteOptions.OPERATION.key, "upsert"))
+      .mode(SaveMode.Append).save(tempBasePath)
+    val df_result_1 = spark.read.format("hudi").load(tempBasePath).selectExpr("keyid", "age")
+    assert(df_result_1.count() == 10)
+    assert(df_result_1.where("age >= 2000").count() == 10)
+
+    // insert duplicated rows (overwrite because of bug, non-strict mode does not work with append)

Review Comment:
   Shouldn't we be using SaveMode.Append here ? Can you elaborate ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438235003

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1442298082

   Hi Hudi devs, I would appreciate a review, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542814478

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1544825092

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * ce89b12639ebe78146afcd2f9c95d646226f1127 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542964544

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a95196cb0c749c1e1e8fb245a2a58d429159d519 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1544561469

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a95196cb0c749c1e1e8fb245a2a58d429159d519 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003) 
   * ce89b12639ebe78146afcd2f9c95d646226f1127 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1472784160

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * cde8d4ffa1cae261731d94c2a0117ece6473a882 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389) 
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441102791

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 13fafcd633926980f7e01117c8039138f14fa3f5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318) 
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c164a3991b8bd900b802fa8de8e85ccb54f6cb98 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1442139735

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c164a3991b8bd900b802fa8de8e85ccb54f6cb98 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349) 
   * e8e3240aff997075065eb01d9277b227ab2bdf73 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1443833230

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * e8e3240aff997075065eb01d9277b227ab2bdf73 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365) 
   * cde8d4ffa1cae261731d94c2a0117ece6473a882 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1441159107

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c164a3991b8bd900b802fa8de8e85ccb54f6cb98 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1439021158

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 13fafcd633926980f7e01117c8039138f14fa3f5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: Do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is disabled

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1438562961

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb1eb5fbe71a3c47e108e736d66c1f0d436a994 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310) 
   * de68cc51637a324a4e711b74ad52092bb569fb52 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1113530490


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   This also makes spark hudi compatible with flink hudi, if COMBINE_BEFORE_UPSERT=false, users can upsert when no preCombine field is defined. Flink has COMBINE_BEFORE_UPSERT set to false as default. 
   That's why it worked in flink but not in spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1118447028


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Does this mean that users should never set COMBINE_BEFORE_UPSERT for CoW table? I thought this config is something that users are allowed to disable deliberately. At least I was surprised it does not work as described in docs.
   
   If combine before upsert is always required for CoW, how Flink deduplicates records for no precombine tables?
   
   My thinking was:
   1. user creates table without precombine field
   2. wants to do update
   3. can't because it's getting exception about missing precombine field
   
   To allow this flow:
   1. user creates table with no precombine field
   2. hudi internally knows no precombine filed was defined and disables COMBINE_BEFORE_UPSERT
   3. Hudi does not try to precombine on upsert, it's user responsibility to deduplicate incoming dataset
   
   But for this to work, this change would need to be done first.
   
   Thought this was a clean way to do it because error is thrown from here:
   https://github.com/apache/hudi/blob/cde8d4ffa1cae261731d94c2a0117ece6473a882/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L1096-L1098
   
   after condition on shouldCombine val



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1117025527


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -960,6 +960,86 @@ class TestHoodieSparkSqlWriter {
     assert(spark.read.format("hudi").load(tempBasePath).where("age >= 2000").count() == 10)
   }
 
+  /**
+   * Test upsert for CoW table without precombine field and combine before upsert disabled.
+   */
+  @Test
+  def testUpsertWithoutPrecombineFieldAndCombineBeforeUpsertDisabled(): Unit = {
+    val options = Map(DataSourceWriteOptions.TABLE_TYPE.key -> HoodieTableType.COPY_ON_WRITE.name(),

Review Comment:
   only for CoW since I assume MoR requires pecombine field



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7998:
URL: https://github.com/apache/hudi/pull/7998#discussion_r1150039506


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -1063,7 +1063,9 @@ object HoodieSparkSqlWriter {
     val recordType = config.getRecordMerger.getRecordType
 
     val shouldCombine = parameters(INSERT_DROP_DUPS.key()).toBoolean ||
-      operation.equals(WriteOperationType.UPSERT) ||
+      (operation.equals(WriteOperationType.UPSERT) &&
+        parameters.getOrElse(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(),
+          HoodieWriteConfig.COMBINE_BEFORE_UPSERT.defaultValue()).toBoolean) ||

Review Comment:
   Yeah, the uniqueness can be assured by the upstream data source, then the user should use the insert operation instead of the upsert operation right? It is risky to loosen the restriction if there are no good knowledge for user to learn about these details.
   
   We can loosen the restriction if we have a good way to tell user there is risky for data duplications.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546422024

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1546984236

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17002",
       "triggerID" : "1542814478",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17003",
       "triggerID" : "a95196cb0c749c1e1e8fb245a2a58d429159d519",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17025",
       "triggerID" : "ce89b12639ebe78146afcd2f9c95d646226f1127",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1546422024",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044",
       "triggerID" : "dfdd33316d71f2866b3052f45b3328e30678f1a3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "93db1f02dc47c597f6ce7708e98ef943d50a1206",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "93db1f02dc47c597f6ce7708e98ef943d50a1206",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   * a44c71610c2efd1ebdb1a19c5195f8b1b5e59df7 UNKNOWN
   * dfdd33316d71f2866b3052f45b3328e30678f1a3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=17044) 
   * 93db1f02dc47c597f6ce7708e98ef943d50a1206 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7998: [HUDI-5824] Fix: do not combine if write operation is Upsert and COMBINE_BEFORE_UPSERT is false

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7998:
URL: https://github.com/apache/hudi/pull/7998#issuecomment-1542777240

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15300",
       "triggerID" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb1eb5fbe71a3c47e108e736d66c1f0d436a994",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15310",
       "triggerID" : "1438235003",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15315",
       "triggerID" : "de68cc51637a324a4e711b74ad52092bb569fb52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15318",
       "triggerID" : "13fafcd633926980f7e01117c8039138f14fa3f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "27d61f01fb6709e3aaa08de9ace7738dbedffb24",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15349",
       "triggerID" : "c164a3991b8bd900b802fa8de8e85ccb54f6cb98",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15365",
       "triggerID" : "e8e3240aff997075065eb01d9277b227ab2bdf73",
       "triggerType" : "PUSH"
     }, {
       "hash" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15389",
       "triggerID" : "cde8d4ffa1cae261731d94c2a0117ece6473a882",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764",
       "triggerID" : "c078f0d7a1a0efe7d8a0674d6f3aeff333febd04",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b572d737ef10724f71642084c0edf9a9a26540cc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 27d61f01fb6709e3aaa08de9ace7738dbedffb24 UNKNOWN
   * c078f0d7a1a0efe7d8a0674d6f3aeff333febd04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15764) 
   * b572d737ef10724f71642084c0edf9a9a26540cc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org