You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "kazdy (via GitHub)" <gi...@apache.org> on 2023/02/25 20:52:07 UTC

[GitHub] [hudi] kazdy opened a new pull request, #8047: [HUDI-5848] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

kazdy opened a new pull request, #8047:
URL: https://github.com/apache/hudi/pull/8047

   ### Change Logs
   
   Align with Flink behaviour, since precombine is now optional in Spark let users do updates on Hudi table where precombine field is not defined.
   fixes #7282
   
   This PR can be merged after #7998 is done as it depends on this fix to work.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   Low, current tests passed, added tests to cover this new functionality.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   Spark quick start guide should be updated, updates/ upserts are possible now on CoW without PreCombine filed out of the box with no additional configuration provided by the user.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445210694

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6c62c08500e543fd0018141238b2e5d0bdb3843 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #8047: [HUDI-5848] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445204504

   I should probably also add sql tests to cover sql update, but I'll do it after https://github.com/apache/hudi/pull/7998 is merged and this branch is rebased on it.
   Also I've seen MERGE INTO uses INSERT if no precombine field is provided, this should be changed after this PR is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445251279

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6edf74eed7ee1140bceb9a75ec89eb47eb52c978 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401) 
   * 1aa19ff1be973030ba204f593c3b257e65c91997 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402) 
   * ae30cc314acd9f921799f81074bde54ff83b98e3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy closed pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy closed pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically
URL: https://github.com/apache/hudi/pull/8047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5848][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445209222

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6c62c08500e543fd0018141238b2e5d0bdb3843 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445265556

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403",
       "triggerID" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ae30cc314acd9f921799f81074bde54ff83b98e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445252087

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1aa19ff1be973030ba204f593c3b257e65c91997 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402) 
   * ae30cc314acd9f921799f81074bde54ff83b98e3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445243094

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 6edf74eed7ee1140bceb9a75ec89eb47eb52c978 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401) 
   * 1aa19ff1be973030ba204f593c3b257e65c91997 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #8047: [HUDI-5848] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #8047:
URL: https://github.com/apache/hudi/pull/8047#discussion_r1117977800


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -454,6 +454,38 @@ class TestHoodieSparkSqlWriter {
     assert(df.except(trimmedDf).count() == 0)
   }
 
+  /**
+   * Test case for upsert dataset without precombine field and with
+   * auto adjusted COMBINE_BEFORE_UPSERT set to false.
+   */
+  @Disabled

Review Comment:
   can be disabled after https://github.com/apache/hudi/pull/7998 is merged and this branch is rebased on it



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -454,6 +454,38 @@ class TestHoodieSparkSqlWriter {
     assert(df.except(trimmedDf).count() == 0)
   }
 
+  /**
+   * Test case for upsert dataset without precombine field and with
+   * auto adjusted COMBINE_BEFORE_UPSERT set to false.
+   */
+  @Disabled

Review Comment:
   can be enabled after https://github.com/apache/hudi/pull/7998 is merged and this branch is rebased on it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #8047: [HUDI-5848][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #8047:
URL: https://github.com/apache/hudi/pull/8047#discussion_r1117978967


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -454,6 +454,38 @@ class TestHoodieSparkSqlWriter {
     assert(df.except(trimmedDf).count() == 0)
   }
 
+  /**
+   * Test case for upsert dataset without precombine field and with
+   * auto adjusted COMBINE_BEFORE_UPSERT set to false.
+   */
+  @Disabled

Review Comment:
   I should probably also add E2E test for SQL update. Could the reviewer point me where is the appropriate place to do so in the codebase?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445225313

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6c62c08500e543fd0018141238b2e5d0bdb3843 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445242084

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6c62c08500e543fd0018141238b2e5d0bdb3843 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399) 
   * 6edf74eed7ee1140bceb9a75ec89eb47eb52c978 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401) 
   * 1aa19ff1be973030ba204f593c3b257e65c91997 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445377207

   Found out that I can't rely on HoodieWriteConfig as I need to check if user provided precombine filed. Rather should use HoodieTableConfig in HoodieSparkSqlWriter.
   
   I also found out that if user do not provide this config explicitly, it's not persisted in hoodie.properties but if "ts" field exists (which is optional but also default for precombine field) it uses it as if it was defined (still not persisted).
   A lot of tests run with assumption that "ts" is precombine field.
   
   A few usecases to consider:
   - no precombine field provided by the user + no "ts" (defaultValue) for precombine in the schema -> no precombine mode,
   - no precombine field provided by the user  + "ts" for precombine is available in the schema -> then what to do?, what if user does not want to use "ts" as precombine field?
   - precombine field provided by he user -> save to properties and use
   - precombine field not provided by the user and no "ts" in schema -> do not use "ts" to combine records
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445392684

   Another thing that's inconsistent, user can:
   1. insert retords without precombine field specified explicitly -> no precombine.field in hoodie.properties
   2. specify precombine.field in DS options, insert data again -> precombine.field shows up in hoodie.properties
   3. change precombine.field in DS options, insert data again -> error thrown, can't change precombine.field
   
   effectively Hudi allows to use "ts" field as precombine field (implicit) and then switch to another field (explicit), but not more times then once


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445226900

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6c62c08500e543fd0018141238b2e5d0bdb3843 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399) 
   * 6edf74eed7ee1140bceb9a75ec89eb47eb52c978 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on a diff in pull request #8047: [HUDI-5848][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on code in PR #8047:
URL: https://github.com/apache/hudi/pull/8047#discussion_r1117978967


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala:
##########
@@ -454,6 +454,38 @@ class TestHoodieSparkSqlWriter {
     assert(df.except(trimmedDf).count() == 0)
   }
 
+  /**
+   * Test case for upsert dataset without precombine field and with
+   * auto adjusted COMBINE_BEFORE_UPSERT set to false.
+   */
+  @Disabled

Review Comment:
   I should probably also add E2E test for SQL update



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445308243

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403",
       "triggerID" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fd4e0efd007ef09d651851aaa529853418ef1643",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15404",
       "triggerID" : "fd4e0efd007ef09d651851aaa529853418ef1643",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ae30cc314acd9f921799f81074bde54ff83b98e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403) 
   * fd4e0efd007ef09d651851aaa529853418ef1643 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15404) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445232647

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b6c62c08500e543fd0018141238b2e5d0bdb3843 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399) 
   * 6edf74eed7ee1140bceb9a75ec89eb47eb52c978 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445253036

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403",
       "triggerID" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1aa19ff1be973030ba204f593c3b257e65c91997 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402) 
   * ae30cc314acd9f921799f81074bde54ff83b98e3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8047: [HUDI-5272][SPARK] No PreCombineField mode - make COMBINE_BEFORE_UPSERT=false automatically

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8047:
URL: https://github.com/apache/hudi/pull/8047#issuecomment-1445306960

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15399",
       "triggerID" : "b6c62c08500e543fd0018141238b2e5d0bdb3843",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15401",
       "triggerID" : "6edf74eed7ee1140bceb9a75ec89eb47eb52c978",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15402",
       "triggerID" : "1aa19ff1be973030ba204f593c3b257e65c91997",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403",
       "triggerID" : "ae30cc314acd9f921799f81074bde54ff83b98e3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fd4e0efd007ef09d651851aaa529853418ef1643",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fd4e0efd007ef09d651851aaa529853418ef1643",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ae30cc314acd9f921799f81074bde54ff83b98e3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15403) 
   * fd4e0efd007ef09d651851aaa529853418ef1643 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org