You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/21 10:50:15 UTC

[GitHub] [hudi] YannByron opened a new pull request, #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

YannByron opened a new pull request, #6734:
URL: https://github.com/apache/hudi/pull/6734

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #6734:
URL: https://github.com/apache/hudi/pull/6734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253743590

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 43933513e0149ec4cc6f2a0a2708e894ea53baf8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552) 
   * 3d9071b62050a2b72d2522098f2b3263ddf91e40 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254860643

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573",
       "triggerID" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11577",
       "triggerID" : "a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2f8e22123894ff370749d341b1a4d5059f0a5844 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573) 
   * a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11577) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254613104

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c2dca18820ac062262e38deed409ed7d7b4d2b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253853378

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d9071b62050a2b72d2522098f2b3263ddf91e40 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253567646

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 43933513e0149ec4cc6f2a0a2708e894ea53baf8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253647791

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 43933513e0149ec4cc6f2a0a2708e894ea53baf8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552) 
   * 3d9071b62050a2b72d2522098f2b3263ddf91e40 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254699173

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573",
       "triggerID" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2f8e22123894ff370749d341b1a4d5059f0a5844 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253653744

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 43933513e0149ec4cc6f2a0a2708e894ea53baf8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552) 
   * 3d9071b62050a2b72d2522098f2b3263ddf91e40 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254618242

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c2dca18820ac062262e38deed409ed7d7b4d2b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569) 
   * 2f8e22123894ff370749d341b1a4d5059f0a5844 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254520367

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d9071b62050a2b72d2522098f2b3263ddf91e40 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554) 
   * 06c2dca18820ac062262e38deed409ed7d7b4d2b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254855745

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573",
       "triggerID" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 2f8e22123894ff370749d341b1a4d5059f0a5844 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573) 
   * a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253572874

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 43933513e0149ec4cc6f2a0a2708e894ea53baf8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on code in PR #6734:
URL: https://github.com/apache/hudi/pull/6734#discussion_r977061910


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -109,6 +109,11 @@ public static Schema createNullableSchema(Schema.Type avroType) {
     return Schema.createUnion(Schema.create(Schema.Type.NULL), Schema.create(avroType));

Review Comment:
   Let's rebase this one onto new one you're adding



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##########
@@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional.cdc
+
+import org.apache.avro.Schema
+import org.apache.avro.generic.IndexedRecord
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.model.{HoodieCommitMetadata, HoodieLogFile}
+import org.apache.hudi.common.table.cdc.{HoodieCDCSupplementalLoggingMode, HoodieCDCUtils}
+import org.apache.hudi.common.table.log.HoodieLogFormat
+import org.apache.hudi.common.table.log.block.{HoodieDataBlock, HoodieLogBlock}
+import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.HoodieInstant
+import org.apache.hudi.common.testutils.RawTripTestPayload.{deleteRecordsToStrings, recordsToStrings}
+import org.apache.hudi.config.{HoodieCleanConfig, HoodieWriteConfig}
+import org.apache.hudi.testutils.HoodieClientTestBase
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.SaveMode
+
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._
+
+class TestCDCDataFrameSuite extends HoodieClientTestBase {
+
+  var spark: SparkSession = _
+
+  val commonOpts = Map(
+    HoodieTableConfig.CDC_ENABLED.key -> "true",
+    "hoodie.insert.shuffle.parallelism" -> "4",
+    "hoodie.upsert.shuffle.parallelism" -> "4",
+    "hoodie.bulkinsert.shuffle.parallelism" -> "2",
+    "hoodie.delete.shuffle.parallelism" -> "1",
+    RECORDKEY_FIELD.key -> "_row_key",
+    PRECOMBINE_FIELD.key -> "timestamp",
+    HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+    HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key -> "1",
+    HoodieCleanConfig.AUTO_CLEAN.key -> "false"
+  )
+
+  @BeforeEach override def setUp(): Unit = {
+    setTableName("hoodie_test")
+    initPath()
+    initSparkContexts()
+    spark = sqlContext.sparkSession
+    initTestDataGenerator()
+    initFileSystem()
+  }
+
+  @AfterEach override def tearDown(): Unit = {
+    cleanupSparkContexts()
+    cleanupTestDataGenerator()
+    cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @CsvSource(Array("cdc_op_key", "cdc_data_before", "cdc_data_before_after"))
+  def testCOWDataSourceWrite(cdcSupplementalLoggingMode: String): Unit = {
+    val options = commonOpts ++ Map(
+      HoodieTableConfig.CDC_SUPPLEMENTAL_LOGGING_MODE.key -> cdcSupplementalLoggingMode
+    )
+
+    // Insert Operation
+    val records1 = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+    val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2))
+    inputDF1.write.format("org.apache.hudi")
+      .options(options)
+      .mode(SaveMode.Overwrite)
+      .save(basePath)
+
+    // init meta client
+    metaClient = HoodieTableMetaClient.builder()
+      .setBasePath(basePath)
+      .setConf(spark.sessionState.newHadoopConf)
+      .build()
+    val instant1 = metaClient.reloadActiveTimeline.lastInstant().get()
+    assertEquals(spark.read.format("hudi").load(basePath).count(), 100)
+    // all the data is new-coming, it will write out cdc log files.
+    assertFalse(hasCDCLogFile(instant1))
+
+    val schemaResolver = new TableSchemaResolver(metaClient)
+    val dataSchema = schemaResolver.getTableAvroSchema(false)
+    val cdcSchema = HoodieCDCUtils.schemaBySupplementalLoggingMode(
+      HoodieCDCSupplementalLoggingMode.parse(cdcSupplementalLoggingMode), dataSchema)
+
+    // Upsert Operation
+    val records2 = recordsToStrings(dataGen.generateUniqueUpdates("001", 50)).toList
+    val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
+    inputDF2.write.format("org.apache.hudi")
+      .options(options)
+      .mode(SaveMode.Append)
+      .save(basePath)
+    val instant2 = metaClient.reloadActiveTimeline.lastInstant().get()
+    assertEquals(spark.read.format("hudi").load(basePath).count(), 100)
+
+    // part of data are updated, it will write out cdc log files
+    assertTrue(hasCDCLogFile(instant2))
+    val cdcData2 = getCDCLogFIle(instant2).flatMap(readCDCLogFile(_, cdcSchema))
+    assertEquals(cdcData2.size, 50)

Review Comment:
   Let's make sure we add test (either modifying this one or adding a new one) that will be asserting log-files contents as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254516618

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d9071b62050a2b72d2522098f2b3263ddf91e40 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554) 
   * 06c2dca18820ac062262e38deed409ed7d7b4d2b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1255019947

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573",
       "triggerID" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11577",
       "triggerID" : "a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a31bbddf20522a5fb1c3aaa056f72bf2bebd38f8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11577) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] YannByron commented on a diff in pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
YannByron commented on code in PR #6734:
URL: https://github.com/apache/hudi/pull/6734#discussion_r977710374


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##########
@@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional.cdc
+
+import org.apache.avro.Schema
+import org.apache.avro.generic.IndexedRecord
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.DataSourceWriteOptions._
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.model.{HoodieCommitMetadata, HoodieLogFile}
+import org.apache.hudi.common.table.cdc.{HoodieCDCSupplementalLoggingMode, HoodieCDCUtils}
+import org.apache.hudi.common.table.log.HoodieLogFormat
+import org.apache.hudi.common.table.log.block.{HoodieDataBlock, HoodieLogBlock}
+import org.apache.hudi.common.table.{HoodieTableConfig, HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.HoodieInstant
+import org.apache.hudi.common.testutils.RawTripTestPayload.{deleteRecordsToStrings, recordsToStrings}
+import org.apache.hudi.config.{HoodieCleanConfig, HoodieWriteConfig}
+import org.apache.hudi.testutils.HoodieClientTestBase
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.SaveMode
+
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.api.Assertions.{assertEquals, assertFalse, assertTrue}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConversions._
+import scala.collection.JavaConverters._
+
+class TestCDCDataFrameSuite extends HoodieClientTestBase {
+
+  var spark: SparkSession = _
+
+  val commonOpts = Map(
+    HoodieTableConfig.CDC_ENABLED.key -> "true",
+    "hoodie.insert.shuffle.parallelism" -> "4",
+    "hoodie.upsert.shuffle.parallelism" -> "4",
+    "hoodie.bulkinsert.shuffle.parallelism" -> "2",
+    "hoodie.delete.shuffle.parallelism" -> "1",
+    RECORDKEY_FIELD.key -> "_row_key",
+    PRECOMBINE_FIELD.key -> "timestamp",
+    HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+    HoodieMetadataConfig.COMPACT_NUM_DELTA_COMMITS.key -> "1",
+    HoodieCleanConfig.AUTO_CLEAN.key -> "false"
+  )
+
+  @BeforeEach override def setUp(): Unit = {
+    setTableName("hoodie_test")
+    initPath()
+    initSparkContexts()
+    spark = sqlContext.sparkSession
+    initTestDataGenerator()
+    initFileSystem()
+  }
+
+  @AfterEach override def tearDown(): Unit = {
+    cleanupSparkContexts()
+    cleanupTestDataGenerator()
+    cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @CsvSource(Array("cdc_op_key", "cdc_data_before", "cdc_data_before_after"))
+  def testCOWDataSourceWrite(cdcSupplementalLoggingMode: String): Unit = {
+    val options = commonOpts ++ Map(
+      HoodieTableConfig.CDC_SUPPLEMENTAL_LOGGING_MODE.key -> cdcSupplementalLoggingMode
+    )
+
+    // Insert Operation
+    val records1 = recordsToStrings(dataGen.generateInserts("000", 100)).toList
+    val inputDF1 = spark.read.json(spark.sparkContext.parallelize(records1, 2))
+    inputDF1.write.format("org.apache.hudi")
+      .options(options)
+      .mode(SaveMode.Overwrite)
+      .save(basePath)
+
+    // init meta client
+    metaClient = HoodieTableMetaClient.builder()
+      .setBasePath(basePath)
+      .setConf(spark.sessionState.newHadoopConf)
+      .build()
+    val instant1 = metaClient.reloadActiveTimeline.lastInstant().get()
+    assertEquals(spark.read.format("hudi").load(basePath).count(), 100)
+    // all the data is new-coming, it will write out cdc log files.
+    assertFalse(hasCDCLogFile(instant1))
+
+    val schemaResolver = new TableSchemaResolver(metaClient)
+    val dataSchema = schemaResolver.getTableAvroSchema(false)
+    val cdcSchema = HoodieCDCUtils.schemaBySupplementalLoggingMode(
+      HoodieCDCSupplementalLoggingMode.parse(cdcSupplementalLoggingMode), dataSchema)
+
+    // Upsert Operation
+    val records2 = recordsToStrings(dataGen.generateUniqueUpdates("001", 50)).toList
+    val inputDF2 = spark.read.json(spark.sparkContext.parallelize(records2, 2))
+    inputDF2.write.format("org.apache.hudi")
+      .options(options)
+      .mode(SaveMode.Append)
+      .save(basePath)
+    val instant2 = metaClient.reloadActiveTimeline.lastInstant().get()
+    assertEquals(spark.read.format("hudi").load(basePath).count(), 100)
+
+    // part of data are updated, it will write out cdc log files
+    assertTrue(hasCDCLogFile(instant2))
+    val cdcData2 = getCDCLogFIle(instant2).flatMap(readCDCLogFile(_, cdcSchema))
+    assertEquals(cdcData2.size, 50)

Review Comment:
   done



##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -109,6 +109,11 @@ public static Schema createNullableSchema(Schema.Type avroType) {
     return Schema.createUnion(Schema.create(Schema.Type.NULL), Schema.create(avroType));

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] YannByron commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
YannByron commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1253835914

   @xushiyan @alexeykudinkin please review this asap so that i can rebase the cdc-reading pr. thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6734: [HUDI-3478][HUDI-4887] Use Avro as the format of persisted cdc data

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6734:
URL: https://github.com/apache/hudi/pull/6734#issuecomment-1254622872

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11552",
       "triggerID" : "43933513e0149ec4cc6f2a0a2708e894ea53baf8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11554",
       "triggerID" : "3d9071b62050a2b72d2522098f2b3263ddf91e40",
       "triggerType" : "PUSH"
     }, {
       "hash" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569",
       "triggerID" : "06c2dca18820ac062262e38deed409ed7d7b4d2b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573",
       "triggerID" : "2f8e22123894ff370749d341b1a4d5059f0a5844",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 06c2dca18820ac062262e38deed409ed7d7b4d2b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11569) 
   * 2f8e22123894ff370749d341b1a4d5059f0a5844 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11573) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org