You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/01 08:02:40 UTC

[GitHub] [hudi] KnightChess opened a new pull request, #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

KnightChess opened a new pull request, #6020:
URL: https://github.com/apache/hudi/pull/6020

   ## What is the purpose of the pull request
   
   fix data quality in concurrent scene when use merge into
   
   ## Brief change log
   
   every executor thread use it own HoodieAvroDeserializer cache
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r928069607


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, HoodieAvroDeserializer]()

Review Comment:
   @KnightChess what i meant was, the current implementation does not use the cache object it created from `initialValue()`. The objects are put into `avroDeserializerCache`, which is not from the ThreadLocal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on a diff in pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r927354900


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, HoodieAvroDeserializer]()

Review Comment:
   or Cache key use  `schema + thread_id` I think can solve this question



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193090182

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193250341

   @KnightChess ok it's not related. i triggered a run. waiting for CI. so have you manually verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193127673

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256",
       "triggerID" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193173258

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256",
       "triggerID" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1172092263

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193153805

   ```
   2022-07-23T15:30:45.0856509Z [ERROR] Failures: 
   2022-07-23T15:30:45.0857053Z [ERROR]   ITTestHoodieDataSource.testWriteAndReadDebeziumJson:916 
   2022-07-23T15:30:45.0859380Z Expected: is "[+I[101, 1000, scooter, 3.140000104904175], +I[102, 2000, car battery, 8.100000381469727], +I[103, 3000, 12-pack drill bits, 0.800000011920929], +I[104, 4000, hammer, 0.75], +I[105, 5000, hammer, 0.875], +I[106, 10000, hammer, 1.0], +I[107, 11000, rocks, 5.099999904632568], +I[108, 8000, jacket, 0.10000000149011612], +I[109, 9000, spare tire, 22.200000762939453], +I[110, 14000, jacket, 0.5]]"
   2022-07-23T15:30:45.0861652Z      but: was "[+I[101, 1000, scooter, 3.140000104904175], +I[102, 2000, car battery, 8.100000381469727], +I[103, 3000, 12-pack drill bits, 0.800000011920929], +I[104, 4000, hammer, 0.75], +I[105, 5000, hammer, 0.875], +I[106, 10000, hammer, 1.0], +I[107, 7000, rocks, 5.300000190734863], +I[108, 8000, jacket, 0.10000000149011612], +I[109, 9000, spare tire, 22.200000762939453]]"
   2022-07-23T15:30:45.0862587Z [INFO] 
   2022-07-23T15:30:45.0862903Z [ERROR] Tests run: 103, Failures: 1, Errors: 0, Skipped: 2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193067980

   > @KnightChess can you rebase master pls? the branch is quite out of date compare to master.
   
   @xushiyan  done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193081075

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * be67ed3b9f54a92acf34eb8ebf2c1520b2f07081 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242) 
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on a diff in pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r927350287


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, HoodieAvroDeserializer]()

Review Comment:
   > this looks not used at all? `avroDeserializerCache` still used for storing the deserializer
   > So, what are we try to fix here ? The schema key in the cache does not work in multi-thread use case ?
   
   reuse the same schema avroDeserializer in diff thread will cause the result record diff from input record.
   for example:
   schema(key is id) : id int, name string, age int
   there has two thread task use  SqlTypedRecord to get sqlRow int the same time.
   task one record:   1,  'one',  18
   task two record:   2,  'two',  19
   if reuse the sanme avroDeserializer, after deserialize in the same time, may has the following results:
   task one result:    1,  'two', 19
   task two result:    2,  'two', 19
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193256121

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193103331

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193109547

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256",
       "triggerID" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   * 47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193109048

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   * 47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1172095987

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r927336771


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, HoodieAvroDeserializer]()

Review Comment:
   So, what are we try to fix here ? The schema key in the cache does not work in multi-thread use case ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193255541

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193073809

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665) 
   * be67ed3b9f54a92acf34eb8ebf2c1520b2f07081 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242) 
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193072796

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665) 
   * be67ed3b9f54a92acf34eb8ebf2c1520b2f07081 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242) 
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on a diff in pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r928071887


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, HoodieAvroDeserializer]()

Review Comment:
   @xushiyan Oh, I get. the `avroDeserializerCache` is unnecessary. here, it only update global, not be used in single thread



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193107954

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193107368

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * c4f8a38f9f7a56e13df803f6a9887f547c07e166 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193273158

   > @KnightChess ok it's not related. i triggered a run. waiting for CI. so have you manually verify this patch?
   
   yes, we have a business scenario, which table has 14 billion records, use spark merge into , about 30 million update evergy day


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193067420

   @KnightChess can you rebase master pls? the branch is quite out of date compare to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #6020:
URL: https://github.com/apache/hudi/pull/6020#discussion_r927151090


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/payload/SqlTypedRecord.scala:
##########
@@ -53,6 +53,11 @@ object SqlTypedRecord {
 
   private val avroDeserializerCache = CacheBuilder.newBuilder().build[Schema, HoodieAvroDeserializer]()
 
+  private val avroDeserializerCacheLocal = new ThreadLocal[Cache[Schema, HoodieAvroDeserializer]] {
+    override def initialValue(): Cache[Schema, HoodieAvroDeserializer] =
+      CacheBuilder.newBuilder().maximumSize(16).build[Schema, HoodieAvroDeserializer]()

Review Comment:
   this looks not used at all? `avroDeserializerCache` still used for storing the deserializer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193066043

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665) 
   * be67ed3b9f54a92acf34eb8ebf2c1520b2f07081 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193136128

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256",
       "triggerID" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256) 
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #6020:
URL: https://github.com/apache/hudi/pull/6020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193263106

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10258) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1172062602

   cause by #5825 
   @danny0405 can you help review it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1172140116

   I have add log to check it, the joinSqlRecord sometimes contains certain fields in target and source, and sometimes they are completely inconsistent at all( key cols too)
   ```scala
   override def combineAndGetUpdateValue(targetRecord: IndexedRecord,
      schema: Schema, properties: Properties): HOption[IndexedRecord] = {
      val sourceRecord = bytesToAvro(recordBytes, schema)
      val joinSqlRecord = new SqlTypedRecord(joinRecord(sourceRecord, targetRecord))
      // log here compare targetRecord and joinSqlRecord use some cols, then log them
      processMatchedRecord(joinSqlRecord, Some(targetRecord), properties)
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1172207950

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193066507

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5980587e9a5aaa22da33c220af7624b3588ca468 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665) 
   * be67ed3b9f54a92acf34eb8ebf2c1520b2f07081 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193135478

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9665",
       "triggerID" : "5980587e9a5aaa22da33c220af7624b3588ca468",
       "triggerType" : "PUSH"
     }, {
       "hash" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10242",
       "triggerID" : "be67ed3b9f54a92acf34eb8ebf2c1520b2f07081",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c4f8a38f9f7a56e13df803f6a9887f547c07e166",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10246",
       "triggerID" : "1193103331",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256",
       "triggerID" : "47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 47bbd0d57bd7bc2360aaa12f5c07f88e8fd3110f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=10256) 
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
KnightChess commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193226252

   @xushiyan sorry I cant not found this error log: `2022-07-23T15:30:45.0857053Z [ERROR]   ITTestHoodieDataSource.testWriteAndReadDebeziumJson:916`, but this case is in hudi-flink moudle, onyl `SqlTypedRecord` is only be used by `ExpressionPayload`, and only be used in spark now.
   
   And The latest  two commit CI test failed task is flink moudle too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6020: [HUDI-4348] fix merge into sql data quality in concurrent scene

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6020:
URL: https://github.com/apache/hudi/pull/6020#issuecomment-1193254915

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c989a77732bf1759e97ebe9ff7bfcce91b7a5b64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c989a77732bf1759e97ebe9ff7bfcce91b7a5b64 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org